Deploy vLLM + Open WebUI Chat Stack
Deploy a production-ready LLM chat interface with vLLM serving and Open WebUI frontend.
Prerequisites
Section titled “Prerequisites”- swm installed with a provider API key configured
- Storage configured (for workspace persistence)
1. Create a pod
Section titled “1. Create a pod”swm pod create -p runpod -g "H100 SXM" -n llm-chat \ --lifecycle auto-down --idle-timeout 30 -y2. Install vLLM
Section titled “2. Install vLLM”swm setup install vllm runpod:YOUR_POD_ID3. Pull a model
Section titled “3. Pull a model”swm models pull runpod:YOUR_POD_ID Qwen/Qwen3-8Bswm models set runpod:YOUR_POD_ID Qwen/Qwen3-8B --restart4. Install Open WebUI
Section titled “4. Install Open WebUI”swm setup install open-webui runpod:YOUR_POD_ID5. Start both services
Section titled “5. Start both services”swm setup start vllm runpod:YOUR_POD_IDswm setup start open-webui runpod:YOUR_POD_ID6. Access
Section titled “6. Access”Open WebUI will be available at the proxy URL printed by swm, or via SSH tunnel at http://localhost:8080.
Multi-GPU models
Section titled “Multi-GPU models”For larger models (70B+), use multiple GPUs. vLLM auto-detects GPU count:
swm pod create -p runpod -g "H100 SXM" -n llm-big \ --gpu-count 4 --lifecycle auto-down -yswm models pull runpod:ID Qwen/Qwen3-235B-A22BvLLM will automatically set --tensor-parallel-size 4.