NVIDIA and Google have expanded their partnership to bring the newly released Gemma 4 open models to the local workstation. This collaboration focuses on optimizing Google’s most capable open-weights models to date for NVIDIA’s hardware ecosystem, spanning from GeForce RTX laptops to Blackwell-based data centers.
Advanced Reasoning and Multimodal Capabilities
Gemma 4 arrives in four distinct sizes: the Effective 2B and 4B models for edge devices, a 26B Mixture of Experts (MoE) variant, and a 31B dense model. These models are built on the same research foundation as Gemini 3 and are released under a permissive Apache 2.0 license. Unlike previous iterations that focused primarily on chat, Gemma 4 is designed for agentic workflows, featuring native multimodal support for vision and audio alongside improved reasoning for complex math and coding tasks.
Local Execution and Hardware Optimization
For PC and workstation users, NVIDIA has integrated these models into its CUDA software stack and TensorRT-LLM library. This allows the models to run with high efficiency on local GPUs, leveraging Tensor Cores to provide low-latency responses. In practical terms, this means the 31B model can run locally on an RTX 5090 using 4-bit quantization, enabling sophisticated AI agents to handle private data, search local documents, and automate desktop tasks without sending information to the cloud.
Empowering a Local-First AI Future
The RTX AI Garage initiative emphasizes this shift toward “local-first” AI. By providing day-one support through NVIDIA NIM microservices and optimizations for platforms like Ollama and Hugging Face, the goal is to make these advanced reasoning capabilities accessible to developers and enthusiasts immediately. This integration ensures that the same AI tools used in professional development environments can scale across the hundreds of millions of RTX-equipped systems already in the wild.