Articles from Source: Nvidia-Developer-Blog

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

2026-05-21 18:00
🚀 Real-time visibility into GPU usage is crucial for maximizing AI infrastructure. Many teams face challenges due to limited insights into GPU consumption on Kubernetes. The new GPU Usage Monitor, built on NVIDIA's DCGM Exporter, provides comprehensive tracking of GPU allocation, memory use, and pod status. It simplifies monitoring with a single Helm chart deployment. This tool addresses common issues like over-provisioning and pod starvation, enabling better resource utilization and timely...
Source: Nvidia Developer Blog
Guy Saltoun

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling

2026-05-21 17:32
Unlocking the potential of NVIDIA GB200 NVL72 requires effective workload placement. This article discusses how Slurm topology-aware job scheduling enhances performance by aligning jobs with the system’s network architecture. The GB200 NVL72 supports exascale computing with 72 interconnected GPUs, offering 130 TB/s bandwidth for AI and HPC tasks. By maximizing the use of NVLink, AI training jobs can significantly improve performance. For optimal results in shared clusters, schedulers must...
Source: Nvidia Developer Blog
Sachin Lakharia

Building Token‑Metered AI Services on Telco AI Factories

2026-05-21 15:30
Telcos are developing sovereign AI factories using NVIDIA's Cloud Partner architecture. This initiative aims to provide governments and businesses with reliable in-country AI infrastructure. However, simply having infrastructure isn't enough for scalable AI services. The focus is shifting towards token-based billing for AI services, ensuring enterprises receive production-ready applications without the complexities of managing infrastructure. This approach allows enterprises to benefit from...
Source: Nvidia Developer Blog
Waleed Badr

Mastering Agentic Techniques: AI Agent Customization

2026-05-20 20:00
Unlock the potential of AI with effective customization! 🤖✨ Autonomous AI agents can handle various business tasks like routing logistics and triaging support tickets. To enhance their performance, customization is key. This article outlines nine techniques for tailoring AI agents, emphasizing the importance of adapting them to specific workflows. From simple prompt changes to advanced methods like reinforcement learning, each approach has its own pros and cons. Learn how to make your AI...
Source: Nvidia Developer Blog
Edward Li

Add a Specialized Deep Research Skill to Agent Harnesses

2026-05-20 16:00
Enhance your agent harness capabilities with specialized deep research skills! 🛠️ Agent harnesses like Claude Code, Codex, and LangChain Deep Agents excel at managing sessions and executing tasks, but deep research can complicate workflows. 🌐 NVIDIA introduces the AI-Q skill, allowing agents to delegate research tasks to a local AI-Q server. This keeps sensitive data secure while producing structured, well-cited reports. 📊 Explore how this skill streamlines workflows without needing to...
Source: Nvidia Developer Blog
William Markito Oliveira

NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents

2026-05-19 23:40
NVIDIA is enhancing the capabilities of autonomous AI agents through verified skills. These skills ensure transparency and trust by detailing their origins, risks, and modifications. This means developers can confidently extend their agents in real workflows. 🛠️✨ NVIDIA agent skills are portable instruction sets that guide AI agents in using NVIDIA tools effectively. They come with documentation and regular updates to ensure reliability. Learn more about how these verified skills can improve...
Source: Nvidia Developer Blog
Moshe Abramovitch

Mastering Agentic Techniques: AI Agent Evaluation

2026-05-19 20:00
Evaluating AI models and agents serves different purposes. Model evaluation tests a foundation model's capabilities, focusing on static tasks and predefined inputs. Benchmarks like MMLU and GSM8K are commonly used to measure performance. In contrast, agent evaluation examines a system's behavior in dynamic environments, assessing its planning and tool usage. This article outlines key differences and offers tips for effective AI agent evaluation. 🤖📊 #AIEvaluation #MachineLearning #TechInsights...
Source: Nvidia Developer Blog
Edward Li

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

2026-05-19 18:00
🚀 Real-time visibility into GPU usage is crucial for maximizing AI infrastructure. Many teams face challenges due to limited insights into GPU consumption on Kubernetes. The new GPU Usage Monitor, built on NVIDIA's DCGM Exporter, provides comprehensive tracking of GPU allocation, memory use, and pod status. It simplifies monitoring with a single Helm chart deployment. This tool addresses common issues like over-provisioning and pod starvation, enabling better resource utilization and timely...
Source: Nvidia Developer Blog
Guy Saltoun

How the NVIDIA Vera Rubin Platform is Solving Agentic AI’s Scale-Up Problem

2026-05-14 19:24
NVIDIA's Vera Rubin Platform addresses the challenges of agentic AI's scale-up problem. Agentic inference introduces non-deterministic trajectories, affecting latency across inference requests. The Vera Rubin NVL72 serves as a core compute engine, optimizing for low-latency and high-throughput demands. This platform is the first to economically handle complex multi-agent workloads with high model capability. It combines extreme co-design for enhanced performance in AI services. Discover how...
Source: Nvidia Developer Blog
Graham Steele

Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials

2026-05-13 16:39
🚀 New advancements in X-ray technology are revolutionizing materials science! The X-ray free-electron laser (XFEL) tracks structural and electron dynamics in materials like semiconductors and catalysts. With ultrashort X-ray pulses, it captures atomic movements and identifies defects. The Accelerated X-ray Analysis for Nanoscale Imaging (XANI) workflow has significantly reduced data processing time from nine months to under four hours, utilizing NVIDIA's powerful computing technology. These...
Source: Nvidia Developer Blog
Irina Demeshko

Transform Video Into Instantly Searchable, Actionable Intelligence with AI Agents and Skills

2026-05-13 15:00
Unlock the power of video with NVIDIA's Metropolis Blueprint for Video Search and Summarization (VSS). 📹✨ VSS transforms vast amounts of video into searchable, actionable insights, making it easier for organizations to monitor operations and detect trends in real time. Discover how to automate deployment and integrate VSS into your applications. Join us live on May 13 at 9 am PT to learn more! #VideoAnalytics #NVIDIA #AI #DataIntelligence #Innovation
Source: Nvidia Developer Blog
Samuel Ochoa

How to Eliminate Pipeline Friction in AI Model Serving

2026-05-12 18:00
🚀 The journey from AI model training to production often faces challenges known as pipeline friction. These issues can lead to inefficiencies, increased costs, and performance degradation. 🛠️ Common sources include model export problems, unsupported operations, dynamic input sizes, and version mismatches. Addressing these can streamline deployments and improve API response times. 📊 The article highlights best practices, such as validating exports early and using ONNX operator versioning...
Source: Nvidia Developer Blog
Lovina Dmello

Introducing NVIDIA Fleet Intelligence for Real-Time GPU Fleet Visibility and Optimization

2026-05-11 19:44
🚀 NVIDIA introduces Fleet Intelligence, a new service for real-time monitoring of GPU fleets. This tool addresses the complexities of managing large GPU clusters, enhancing visibility into power, temperature, performance, and health. It aims to optimize resource utilization and ensure consistent performance across systems. Fleet Intelligence is deployment-agnostic and suitable for data center GPU and CPU management. #NVIDIA #GPUMonitoring #FleetIntelligence #DataCenter #TechInnovation
Source: Nvidia Developer Blog
Christian Shrauder

Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding

2026-05-08 17:13
🚀 Exciting advancements in AI! A recent study explores enhancing Bash command generation in small language models using grammar-constrained decoding. This method aims to improve reliability for executing tasks in agentic systems. The research found that by applying this technique, the average success rate of command generation increased significantly, from 62.5% to 75.2%. This development could broaden the deployment of small models in various environments, addressing the challenges of syntax...
Source: Nvidia Developer Blog
Joseph Lucas

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo

2026-05-08 15:59
📢 Exploring structured interactions in AI, the article discusses the importance of agentic exchanges in NVIDIA Dynamo. It highlights how assistant turns intertwine reasoning with tool calls, ensuring a seamless user experience. Key improvements were made in parser and API coverage to enhance streaming behavior and performance. The focus remains on correctness and user experience as agentic harnesses evolve rapidly. #AI #NVIDIA #Dynamo #AgenticExchange #TechInnovation
Source: Nvidia Developer Blog
Matej Kosec

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling

2026-05-07 21:20
🚀 The NVIDIA GB200 NVL72 revolutionizes GPU cluster design by extending NVLink coherence across an entire rack. This innovation enables exascale performance but alters existing scheduling assumptions. 🔧 To tackle the challenges of rack-scale locality, the Slurm workload manager has introduced the topology/block plugin. This allows for more precise application-specific NVLink requirements. 📈 The article details how to configure these features to enhance performance and optimize workload...
Source: Nvidia Developer Blog
Felix Abecassis

Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer

2026-05-07 21:18
🚀 Model quantization is a key method for reducing VRAM usage and enhancing inference performance on NVIDIA GeForce RTX GPUs. This article details how to utilize the NVIDIA Model Optimizer to quantize a CLIP model in FP8 format using the post-training quantization (PTQ) method. The NVIDIA Model Optimizer offers advanced techniques like quantization and pruning, supporting various model formats such as Hugging Face and PyTorch. 💡 CLIP, a foundation model from OpenAI, effectively aligns images...
Source: Nvidia Developer Blog
Ruixiang Wang

Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus

2026-05-07 16:02
🚀 Distributed deep learning relies on efficient GPU-to-GPU communication via the NVIDIA Collective Communication Library (NCCL). When training slows, pinpointing issues can be complex. The NCCL Inspector enhances this process by providing continuous performance reports, tracking operation type, size, and bandwidth. With the new real-time monitoring feature integrated with Prometheus, users can access live visualizations directly in their infrastructure dashboard. This marks a significant step...
Source: Nvidia Developer Blog
Ava Arnaz

Powering AI Factories with NVIDIA Enterprise Reference Architectures

2026-04-29 16:41
AI factories are shaping the future of enterprise productivity. These systems leverage agentic AI for reasoning, automation, and real-time decision-making. Success relies on robust infrastructure that ensures scalability and performance, transitioning from pilot to production smoothly. NVIDIA's Enterprise Reference Architectures offer the necessary guidance for building this foundation, minimizing integration risks and deployment time. These architectures enable organizations to scale AI...
Source: Nvidia Developer Blog
Shashank Sabhlok

Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo

2026-04-28 19:00
🚀 Exciting advancements in computational biology! NVIDIA BioNeMo has introduced a new context parallelism (CP) framework that allows holistic modeling of large biomolecular systems without the memory constraints of traditional GPUs. This innovation addresses the limitations of prior reductionist methods that often sacrificed global structural accuracy. The article details how to implement CP in biomolecular architectures, focusing on the need for familiarity with geometric deep learning...
Source: Nvidia Developer Blog
Dejun Lin

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model

2026-04-28 16:01
🚀 Introducing the NVIDIA Nemotron 3 Nano Omni! This innovative model unifies multimodal reasoning, allowing agents to seamlessly process visual, audio, and textual inputs in one effective system. It simplifies orchestration, reduces costs, and enhances context consistency. Nemotron 3 Nano Omni excels in document intelligence and video/audio understanding, achieving top scores in industry benchmarks. Built on a 30B‑A3B hybrid architecture, it supports high throughput and customizable...
Source: Nvidia Developer Blog
Anjali Shah

24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving

2026-04-28 15:00
The subsurface industry is experiencing a significant digital transformation. Traditionally, unlocking reservoir potential relied on manual workflows, which have become a bottleneck due to increasing data complexity. Agentic AI offers a solution by automating repetitive tasks, allowing engineers to focus on strategic oversight. This shift can reduce project delays and enhance simulation efficiency. The framework discussed is applicable across industries, promoting faster, more effective...
Source: Nvidia Developer Blog
Tsubasa Onishi

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

2026-04-24 23:29
🚀 DeepSeek has launched its fourth-generation models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. These models are aimed at enhancing million-token context inference. 🧠 The V4-Pro features 1.6 trillion total parameters, while the V4-Flash offers 284 billion parameters for faster, more efficient tasks. Both support a 1M-token context window, ideal for complex coding and document analysis. 🔧 Architectural improvements in the V4 family result in significant reductions in inference costs, making it a...
Source: Nvidia Developer Blog
Anu Srivastava

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

2026-04-24 15:00
Federated learning (FL) is becoming essential as valuable data often cannot be moved due to regulations and risks. NVIDIA FLARE offers a solution by allowing training to occur where the data resides, addressing privacy and compliance concerns. The updated API simplifies the developer experience, letting teams transform local scripts into federated clients with minimal code changes. Key features include no data copying and strong governance controls. #FederatedLearning #NVIDIA #DataPrivacy...
Source: Nvidia Developer Blog
Holger Roth

Winning a Kaggle Competition with Generative AI–Assisted Coding

2026-04-23 20:15
In March 2026, a team utilized three LLM agents to generate over 600,000 lines of code, achieving first place in a Kaggle competition on telecom customer churn prediction. 🚀 These agents significantly accelerated the coding and experimentation process, addressing key bottlenecks in machine learning. GPU technologies also played a vital role in this success. The winning solution featured a complex stack of 150 models selected from 850 experiments. 📊 #Kaggle #DataScience #MachineLearning...
Source: Nvidia Developer Blog
Chris Deotte

Simplify Sparse Deep Learning with Universal Sparse Tensor in nvmath-python

2026-04-22 23:50
🚀 Exciting news for deep learning developers! The Universal Sparse Tensor (UST) is now integrated into nvmath-python v0.9.0, enhancing flexibility and performance for sparse scientific applications. Key features include zero-cost interoperability with PyTorch, custom formats for sparsity schemes, and transparent caching to improve efficiency. Explore how UST can optimize your existing models and streamline your coding process! #DeepLearning #SparseTensor #nvmath #UST #Python
Source: Nvidia Developer Blog
Aart J.C. Bik

Scaling the AI-Ready Data Center with NVIDIA RTX PRO 4500 Blackwell Server Edition and NVIDIA vGPU 20

2026-04-22 20:30
AI is transforming enterprise applications, demanding a shift in modern data centers. 🚀 The NVIDIA RTX PRO 4500 Blackwell Server Edition and vGPU 20 tackle the challenge of dedicated GPU access. With Multi-Instance GPU (MIG) technology, a single GPU can be partitioned into independent instances, allowing multiple developers to work without resource conflicts. The integration of these technologies boosts performance for varying workloads, from productivity tools to AI development. This post...
Source: Nvidia Developer Blog
Phoebe Lee

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron

2026-04-22 20:01
NVIDIA explores the impact of higher-order optimization algorithms like Shampoo and Muon in training large language models (LLMs). Recent findings show that Muon has been successfully utilized for models such as Kimi K2 and GLM-5, demonstrating comparable training performance to AdamW on NVIDIA systems. The research highlights the efficiency of using NVIDIA NeMo Megatron Bridge for enhanced training throughput. For more details on experimental settings, check the article! #AI #NVIDIA...
Source: Nvidia Developer Blog
Hao Wu

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

2026-04-20 23:01
The rise of open source generative AI models is transforming how we deploy technology in the physical world. Developers are keen to implement these models on edge devices for tasks like automation in robotics. 🤖 A significant challenge lies in efficiently running large models on devices with limited memory. The NVIDIA Jetson platform is designed to optimize memory use, enhancing performance while managing resource constraints. This article discusses strategies for maximizing efficiency in...
Source: Nvidia Developer Blog
Anshuman Bhat

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision

2026-04-20 22:52
Reinforcement learning (RL) is crucial as large language models (LLMs) evolve from basic text generation to complex reasoning. Algorithms like Group Relative Policy Optimization (GRPO) enhance model improvement through iterative feedback. RL training involves two phases: a latency-sensitive generation phase and a high-throughput training phase. Researchers are utilizing low-precision data types, such as FP8, to improve performance. This approach can enhance efficiency, especially in scenarios...
Source: Nvidia Developer Blog
Guyue Huang

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

2026-04-20 17:00
AI tools are transforming software development, acting as real-time copilots to automate tasks like code generation and debugging. However, recent findings by the NVIDIA AI Red Team reveal vulnerabilities in these tools, particularly through indirect AGENTS.md injection attacks via compromised dependencies. This highlights new supply chain risks in development environments. The article outlines the attack process and offers strategies for mitigating these risks, emphasizing the importance of...
Source: Nvidia Developer Blog
Daniel Teixeira

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo

2026-04-17 22:52
Coding agents are transforming software development by generating production code at scale. Stripe’s agents produce over 1,300 pull requests (PRs) weekly, while Ramp sees 30% of merged PRs attributed to agents. Spotify reports 650+ agent-generated PRs monthly. Tools like Claude Code and Codex handle numerous API calls during coding sessions, ensuring efficient workflows. #CodingAgents #SoftwareDevelopment #AI #TechInnovation #NVIDIA
Source: Nvidia Developer Blog
Ishan Dhanani

Build a Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw

2026-04-17 18:59
Unlock the potential of AI with NVIDIA NemoClaw! 🤖 Agents are transitioning from simple Q&A systems to advanced autonomous assistants. However, deploying them requires careful attention to data privacy and control. NVIDIA NemoClaw offers an open-source solution to build secure, long-running AI agents. This tutorial guides you through deploying NemoClaw on NVIDIA DGX Spark, connecting it to Telegram for easy access. Explore how to create your own local AI assistant today! 📱🔒 #AI #NVIDIA...
Source: Nvidia Developer Blog
Patrick Moorhead

Accelerate Clean, Modular, Nuclear Reactor Design with AI Physics

2026-04-17 15:00
The development of safe and efficient nuclear reactors is gaining momentum, focusing on Small Modular Reactors (SMRs) and Generation IV designs. SMRs aim to standardize designs and enhance project economics, while Gen IV reactors address fuel-cycle challenges and waste management. To streamline the design process, engineers are using digital twins and AI simulations, reducing costs and time significantly. Tools like NVIDIA's CUDA-X and PhysicsNeMo are key in this innovation journey....
Source: Nvidia Developer Blog
Mark Hobbs

How to Build Vision AI Pipelines Using DeepStream Coding Agents

2026-04-16 15:00
🚀 Building real-time vision AI applications can be challenging, requiring complex data pipelines and extensive coding. NVIDIA DeepStream 9 simplifies this process with coding agents like Claude Code and Cursor, enabling developers to create optimized code efficiently. This platform supports multi-camera setups to process vast amounts of video, audio, and sensor data, accelerating insights across various industries. Join the live session on April 16 at 9am PT to learn more! 📅 #VisionAI...
Source: Nvidia Developer Blog
Debraj Sinha

Building Custom Atomistic Simulation Workflows for Chemistry and Materials Science with NVIDIA ALCHEMI Toolkit

2026-04-14 16:30
NVIDIA introduces the ALCHEMI Toolkit, aimed at enhancing computational chemistry and materials science. 🌐 This toolkit combines GPU-accelerated building blocks with AI to streamline atomistic simulations. It addresses the speed and accuracy challenges faced by traditional methods, such as DFT and classical force fields. ⚛️ Key features include scalable microservices and foundational GPU kernels, promoting modular workflows for researchers. This development supports tasks like geometry...
Source: Nvidia Developer Blog
Erica Tsai

NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance

2026-04-14 16:00
Discover NVIDIA NVbandwidth, a vital tool for CUDA developers focused on GPU data transfer performance. 🖥️ This tool measures bandwidth and latency for various memory copy patterns, helping users evaluate system performance, diagnose bottlenecks, and optimize workloads. Key features include comprehensive testing for unidirectional, bidirectional, and multi-GPU configurations. Learn more about enhancing your GPU setups! 🚀💡 #NVIDIA #CUDA #GPUPerformance #DataTransfer #TechTools
Source: Nvidia Developer Blog
Eva Sitaridi

NVIDIA Ising Introduces AI-Powered Workflows to Build Fault-Tolerant Quantum Systems

2026-04-14 14:15
🚀 NVIDIA has launched Ising, the first family of open AI models designed for building quantum processors. 🛠️ It features two key models: Ising Calibration, which automates QPU calibration tasks, and Ising Decoding, utilizing advanced 3D CNNs for error correction. 🔍 These models aim to tackle noise in quantum computing, enhancing performance and reducing error rates significantly. Learn more about how Ising supports error correction and scaling to Quantum-GPU supercomputers. #NVIDIA...
Source: Nvidia Developer Blog
Tom Lubowe

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

2026-04-12 01:02
🚀 The MiniMax M2.7 has been released, enhancing the MiniMax M2.5 model for complex AI applications across various fields, including ML research and software engineering. This model features a sparse mixture-of-experts design, maintaining low inference costs while leveraging a 230B-parameter architecture. It utilizes advanced techniques like Rotary Position Embeddings and a top-k expert routing mechanism for optimal performance. Additionally, NVIDIA introduces NemoClaw, a tool for safely...
Source: Nvidia Developer Blog
Anu Srivastava

Running Large-Scale GPU Workloads on Kubernetes with Slurm

2026-04-09 17:00
Unlocking the power of GPU workloads on Kubernetes is now possible with Slurm integration. 🌐 Slurm, a leading job scheduling system, manages over 65% of TOP500 systems. The challenge lies in integrating its capabilities into Kubernetes without duplicating environments. The Slinky project offers two solutions: the slurm-bridge for native Kubernetes workloads and the slurm-operator for running full Slurm clusters. This post highlights the slurm-operator, detailing its architecture, deployment,...
Source: Nvidia Developer Blog
Anton Polyakov