Deploy vLLM + Open WebUI Chat Stack

Deploy a production-ready LLM chat interface with vLLM serving and Open WebUI frontend.

Prerequisites

swm pod create -p runpod -g "H100 SXM" -n llm-chat \
  --lifecycle auto-down --idle-timeout 30 -y

swm setup install vllm runpod:YOUR_POD_ID

swm models pull runpod:YOUR_POD_ID Qwen/Qwen3-8B
swm models set runpod:YOUR_POD_ID Qwen/Qwen3-8B --restart

swm setup install open-webui runpod:YOUR_POD_ID

swm setup start vllm runpod:YOUR_POD_ID
swm setup start open-webui runpod:YOUR_POD_ID

Open WebUI will be available at the proxy URL printed by swm, or via SSH tunnel at http://localhost:8080.

For larger models (70B+), use multiple GPUs. vLLM auto-detects GPU count:

swm pod create -p runpod -g "H100 SXM" -n llm-big \
  --gpu-count 4 --lifecycle auto-down -y
swm models pull runpod:ID Qwen/Qwen3-235B-A22B

vLLM will automatically set --tensor-parallel-size 4.