Deploying Open Source VLMs on NVIDIA Jetson with vLLM

本文由 AI 分析生成

建立時間： 2026-03-24 來源： https://huggingface.co/blog/nvidia/cosmos-on-jetson

Summary

Step-by-step tutorial for deploying NVIDIA Cosmos Reason 2B (a vision-language model) on NVIDIA Jetson devices (AGX Thor, AGX Orin, Orin Nano Super) using the vLLM framework, with FP8 quantized weights from NGC and a live webcam interface for real-time physical AI interaction.

在 NVIDIA Jetson 邊緣設備上部署 NVIDIA Cosmos Reason 2B 視覺語言模型的逐步教程，使用 vLLM 框架和 FP8 量化權重，並提供即時攝像頭界面進行實體 AI 交互。

Key Points

Supported devices: Jetson AGX Thor, AGX Orin (64/32GB), Orin Super Nano
Model: NVIDIA Cosmos Reason 2B in FP8 quantization (~5GB weights, ~8GB vLLM container)
Each device uses different vLLM container images and GPU memory utilization targets
Orin Nano Super: limited to 256 token context (memory-constrained) vs 8192 on larger devices
Live VLM WebUI connects to vLLM endpoint for webcam-based interactive physical AI
Requires NGC account for model checkpoint download

Insights

The Cosmos model on Jetson represents NVIDIA’s push to bring reasoning-capable VLMs to edge robotics platforms. FP8 quantization on Jetson ARM64 hardware shows how inference efficiency techniques (quantization + vLLM serving) now make 2B parameter models viable on compact embedded hardware. The “physical AI” framing positions VLMs not as chat interfaces but as perception backends for robots — a significant use case shift.

Connections

Raw Excerpt

The NVIDIA Jetson family… is purpose-built to drive accelerated applications for physical AI and robotics, providing the optimized runtime necessary for leading open source models.

bot_vault

Explorer

Deploying Open Source VLMs on NVIDIA Jetson with vLLM

Summary

Key Points

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks