One CLI. Ten GPU clouds. Search, provision, deploy, sync, and manage — all over SSH.
Don't want to learn the CLI? You don't have to. Your AI agent speaks swm natively.
https://r1008si-8080.proxy.runpod.net$ swm gpus -g h100 --max-price 3.50 $ swm pod create -p runpod -g "H100 SXM" \ -n qwen3 --gpu-count 4 \ --lifecycle auto-down -y $ swm setup install vllm runpod:r1008si $ swm models pull runpod:r1008si \ Qwen/Qwen3-235B-A22B $ swm setup install open-webui runpod:r1008si $ swm setup start vllm runpod:r1008si $ swm setup start open-webui runpod:r1008si
Works with Cursor, Codex, Claude Code, Windsurf, and any agent that can run shell commands.
Drop in the SKILL.md and go.
Push from one cloud, pull to another. Resume where you left off — with continuous background sync.
$ swm pod create -p runpod -g h200 -n train ✓ Workspace pulled ✓ Watcher running ✓ Auto-sync daemon: pushing every 60s
Every pod created with a workspace gets a background daemon that tails the filesystem watcher and pushes new, changed, and deleted files to storage every 60 seconds. Non-destructive by default; deletions are opt-in and gated by a prior sync.
$ swm sync push runpod:r1008si $ swm sync pull lambda:def456
Your workspace lives in S3-compatible storage (Backblaze B2, AWS S3, Google GCS). Push from one cloud, pull to another.
Tier 1: inotify watcher → instant (changed files only)
Tier 2: find -newer → seconds (watcher not running)
Tier 3: full parallel → s5cmd 512 workers (first push) The filesystem watcher tracks every change. Your next push uploads only what changed — even if it's 3 files out of 600,000.
$ swm sync push runpod:r1008si --tar Packing with pigz (48 cores)... Uploading 34 GB → s3://bucket/project.tar.gz ✓ Push complete
600k small files? Tar mode packs everything with parallel gzip. One S3 object instead of 600,000 API calls.
$ swm pod down my-project ✓ Workspace pushed (7 files changed) ✓ Pod terminated # Later, on any cloud: $ swm pod create -p lambda -g a100 \ -n my-project -w my-project
One command to save and destroy. Resume on any cloud, any provider, any GPU.
Lifecycle automation, cost tracking, framework auto-detection, and model management.
$ swm guard set r1008si \
--mode auto-down --idle-timeout 30 Monitors SSH sessions, GPU utilization, filesystem writes, running transfers, and active processes. If nothing's happening, it saves your workspace and terminates. No more $96 overnight H100 bills.
GPU: 0% SSH: 0 FS writes: none Transfers: none
↓
Idle 30 min → auto-down
✓ Workspace pushed
✓ Pod terminated
✓ Cost session closed $ swm costs live runpod:r1008si H100 SXM $2.49/hr 2h 14m $5.57 $ swm costs summary --period week Provider GPU Hours Cost runpod H100 SXM 18.5 $46.07 vastai A100 80GB 3.2 $8.96 Total 21.7 $55.03 $ swm costs reconcile -p runpod Local: $46.07 RunPod: $46.12 (Δ $0.05, 0.1%)
Local cost tracking. Per-session, per-GPU, per-provider breakdowns. Reconcile against actual provider billing APIs. Set budgets with alerts.
$ swm setup install vllm runpod:r1008si ✓ 4x H100 SXM → tensor parallelism = 4 ✓ vLLM installed with --tensor-parallel-size 4 ✓ Health check: /v1/models → 200
Auto-detects GPU count for tensor parallelism, opens SSH tunnels for unexposed ports, probes health endpoints. 7 built-in frameworks: vLLM, Open WebUI, Ollama, ComfyUI, SwarmUI, Axolotl, H2O LLM Studio.
$ swm models search "qwen3 coder" $ swm models pull runpod:r1008si \ Qwen/Qwen3-235B-A22B $ swm models set runpod:r1008si \ Qwen/Qwen3-235B-A22B --restart ✓ Model activated, vLLM restarted (TP=4)
Search HuggingFace Hub, pull to any pod, hot-swap vLLM models. Supports both HuggingFace and Ollama.
No agents on the pod. No custom images. No webhooks.
Everything happens over SSH and provider APIs from your machine.
Your workspace lives in S3-compatible storage. Your config is a TOML file. Your pods are standard cloud instances. swm is a layer on top — it never owns your infrastructure. Uninstall it and everything still works.
$ pipx install swm-gpu $ swm config set runpod.api_key YOUR_KEY $ swm gpus
Three commands to get started.