swm — Your GPU stack, your terminal, any cloud

Or just ask your agent.

Don't want to learn the CLI? You don't have to. Your AI agent speaks swm natively.

You

I need to run Qwen3-235B on a GPU.
Cheapest option, auto-terminate when idle.

Agent

Found 4x H100 SXM on RunPod at $9.96/hr. Pod is up. vLLM running with TP=4.

Open WebUI at:
https://r1008si-8080.proxy.runpod.net

Auto-down set for 30 min idle. Your workspace saves to S3 automatically.

agent executing

$ swm gpus -g h100 --max-price 3.50
$ swm pod create -p runpod -g "H100 SXM" \
    -n qwen3 --gpu-count 4 \
    --lifecycle auto-down -y
$ swm setup install vllm runpod:r1008si
$ swm models pull runpod:r1008si \
    Qwen/Qwen3-235B-A22B
$ swm setup install open-webui runpod:r1008si
$ swm setup start vllm runpod:r1008si
$ swm setup start open-webui runpod:r1008si

Works with Cursor, Codex, Claude Code, Windsurf, and any agent that can run shell commands.
Drop in the SKILL.md and go.

CursorCodexClaude CodeWindsurfCopilot + any agent

Your workspace follows you.

Push from one cloud, pull to another. Resume where you left off — with continuous background sync.

Continuous auto-sync

$ swm pod create -p runpod -g h200 -n train
  ✓ Workspace pulled
  ✓ Watcher running
  ✓ Auto-sync daemon: pushing every 60s

Every pod created with a workspace gets a background daemon that tails the filesystem watcher and pushes new, changed, and deleted files to storage every 60 seconds. Non-destructive by default; deletions are opt-in and gated by a prior sync.

Push and pull across clouds

$ swm sync push runpod:r1008si
$ swm sync pull lambda:def456

Your workspace lives in S3-compatible storage (Backblaze B2, AWS S3, Google GCS). Push from one cloud, pull to another.

Three-tier smart sync

Tier 1: inotify watcher → instant (changed files only)
Tier 2: find -newer     → seconds (watcher not running)
Tier 3: full parallel   → s5cmd 512 workers (first push)

The filesystem watcher tracks every change. Your next push uploads only what changed — even if it's 3 files out of 600,000.

Tar mode for massive workspaces

$ swm sync push runpod:r1008si --tar
  Packing with pigz (48 cores)...
  Uploading 34 GB → s3://bucket/project.tar.gz
  ✓ Push complete

600k small files? Tar mode packs everything with parallel gzip. One S3 object instead of 600,000 API calls.

pod down = push + terminate + resume

$ swm pod down my-project
  ✓ Workspace pushed (7 files changed)
  ✓ Pod terminated
# Later, on any cloud:
$ swm pod create -p lambda -g a100 \
    -n my-project -w my-project

One command to save and destroy. Resume on any cloud, any provider, any GPU.

It watches so you don't have to.

Lifecycle automation, cost tracking, framework auto-detection, and model management.

Lifecycle guard

$ swm guard set r1008si \
    --mode auto-down --idle-timeout 30

Monitors SSH sessions, GPU utilization, filesystem writes, running transfers, and active processes. If nothing's happening, it saves your workspace and terminates. No more $96 overnight H100 bills.

GPU: 0%  SSH: 0  FS writes: none  Transfers: none
                    ↓
          Idle 30 min → auto-down
          ✓ Workspace pushed
          ✓ Pod terminated
          ✓ Cost session closed

Cost tracking & reconciliation

$ swm costs live
  runpod:r1008si  H100 SXM  $2.49/hr  2h 14m  $5.57

$ swm costs summary --period week
  Provider  GPU        Hours   Cost
  runpod    H100 SXM   18.5    $46.07
  vastai    A100 80GB  3.2     $8.96
  Total                21.7    $55.03

$ swm costs reconcile -p runpod
  Local:   $46.07
  RunPod:  $46.12  (Δ $0.05, 0.1%)

Local cost tracking. Per-session, per-GPU, per-provider breakdowns. Reconcile against actual provider billing APIs. Set budgets with alerts.

Framework auto-detection

$ swm setup install vllm runpod:r1008si
  ✓ 4x H100 SXM → tensor parallelism = 4
  ✓ vLLM installed with --tensor-parallel-size 4
  ✓ Health check: /v1/models → 200

Auto-detects GPU count for tensor parallelism, opens SSH tunnels for unexposed ports, probes health endpoints. 7 built-in frameworks: vLLM, Open WebUI, Ollama, ComfyUI, SwarmUI, Axolotl, H2O LLM Studio.

Model management

$ swm models search "qwen3 coder"
$ swm models pull runpod:r1008si \
    Qwen/Qwen3-235B-A22B
$ swm models set runpod:r1008si \
    Qwen/Qwen3-235B-A22B --restart
  ✓ Model activated, vLLM restarted (TP=4)

Search HuggingFace Hub, pull to any pod, hot-swap vLLM models. Supports both HuggingFace and Ollama.

How it works.

No agents on the pod. No custom images. No webhooks.
Everything happens over SSH and provider APIs from your machine.

You

Terminal AI Agent

→

swm CLI

Lifecycle Guard Cost Tracker Workspace Sync

↓ Provider APIs

10 GPU Clouds

RunPod · Vast.ai · Lambda · AWS · GCP Azure · CoreWeave · Vultr · TensorDock · FluidStack

↓ SSH

Remote Pod

vLLM / ComfyUI / Ollama / ... FS Watcher · Guard Script

↓ s5cmd

Object Storage

B2 · S3 · GCS

Zero lock-in.

Your workspace lives in S3-compatible storage. Your config is a TOML file. Your pods are standard cloud instances. swm is a layer on top — it never owns your infrastructure. Uninstall it and everything still works.

Free. Open source. Apache 2.0.

get started

$ pipx install swm-gpu
$ swm config set runpod.api_key YOUR_KEY
$ swm gpus

Three commands to get started.

Read the Docs Star on GitHub

Your GPU stack, your terminal, any cloud.