Summary

Step-by-step tutorial for deploying NVIDIA Cosmos Reason 2B (a vision-language model) on NVIDIA Jetson devices (AGX Thor, AGX Orin, Orin Nano Super) using the vLLM framework, with FP8 quantized weights from NGC and a live webcam interface for real-time physical AI interaction.

在 NVIDIA Jetson 邊緣設備上部署 NVIDIA Cosmos Reason 2B 視覺語言模型的逐步教程,使用 vLLM 框架和 FP8 量化權重,並提供即時攝像頭界面進行實體 AI 交互。

Key Points

  • Supported devices: Jetson AGX Thor, AGX Orin (64/32GB), Orin Super Nano
  • Model: NVIDIA Cosmos Reason 2B in FP8 quantization (~5GB weights, ~8GB vLLM container)
  • Each device uses different vLLM container images and GPU memory utilization targets
  • Orin Nano Super: limited to 256 token context (memory-constrained) vs 8192 on larger devices
  • Live VLM WebUI connects to vLLM endpoint for webcam-based interactive physical AI
  • Requires NGC account for model checkpoint download

Insights

The Cosmos model on Jetson represents NVIDIA’s push to bring reasoning-capable VLMs to edge robotics platforms. FP8 quantization on Jetson ARM64 hardware shows how inference efficiency techniques (quantization + vLLM serving) now make 2B parameter models viable on compact embedded hardware. The “physical AI” framing positions VLMs not as chat interfaces but as perception backends for robots — a significant use case shift.

Connections

Raw Excerpt

The NVIDIA Jetson family… is purpose-built to drive accelerated applications for physical AI and robotics, providing the optimized runtime necessary for leading open source models.