Skip to content

Deploy vLLM + Open WebUI Chat Stack

Deploy a production-ready LLM chat interface with vLLM serving and Open WebUI frontend.

  • swm installed with a provider API key configured
  • Storage configured (for workspace persistence)
Terminal window
swm pod create -p runpod -g "H100 SXM" -n llm-chat \
--lifecycle auto-down --idle-timeout 30 -y
Terminal window
swm setup install vllm runpod:YOUR_POD_ID
Terminal window
swm models pull runpod:YOUR_POD_ID Qwen/Qwen3-8B
swm models set runpod:YOUR_POD_ID Qwen/Qwen3-8B --restart
Terminal window
swm setup install open-webui runpod:YOUR_POD_ID
Terminal window
swm setup start vllm runpod:YOUR_POD_ID
swm setup start open-webui runpod:YOUR_POD_ID

Open WebUI will be available at the proxy URL printed by swm, or via SSH tunnel at http://localhost:8080.

For larger models (70B+), use multiple GPUs. vLLM auto-detects GPU count:

Terminal window
swm pod create -p runpod -g "H100 SXM" -n llm-big \
--gpu-count 4 --lifecycle auto-down -y
swm models pull runpod:ID Qwen/Qwen3-235B-A22B

vLLM will automatically set --tensor-parallel-size 4.