2026-05-21 18:00
🚀 Real-time visibility into GPU usage is crucial for maximizing AI infrastructure. Many teams face challenges due to limited insights into GPU consumption on Kubernetes. The new GPU Usage Monitor, built on NVIDIA's DCGM Exporter, provides comprehensive tracking of GPU allocation, memory use, and pod status. It simplifies monitoring with a single Helm chart deployment. This tool addresses common issues like over-provisioning and pod starvation, enabling better resource utilization and timely...
Source: Nvidia Developer Blog
Guy Saltoun
2026-05-21 17:32
Unlocking the potential of NVIDIA GB200 NVL72 requires effective workload placement. This article discusses how Slurm topology-aware job scheduling enhances performance by aligning jobs with the system’s network architecture. The GB200 NVL72 supports exascale computing with 72 interconnected GPUs, offering 130 TB/s bandwidth for AI and HPC tasks. By maximizing the use of NVLink, AI training jobs can significantly improve performance. For optimal results in shared clusters, schedulers must...
Source: Nvidia Developer Blog
Sachin Lakharia
2026-05-21 15:30
Telcos are developing sovereign AI factories using NVIDIA's Cloud Partner architecture. This initiative aims to provide governments and businesses with reliable in-country AI infrastructure. However, simply having infrastructure isn't enough for scalable AI services. The focus is shifting towards token-based billing for AI services, ensuring enterprises receive production-ready applications without the complexities of managing infrastructure. This approach allows enterprises to benefit from...
Source: Nvidia Developer Blog
Waleed Badr
2026-05-20 20:00
Unlock the potential of AI with effective customization! 🤖✨ Autonomous AI agents can handle various business tasks like routing logistics and triaging support tickets. To enhance their performance, customization is key. This article outlines nine techniques for tailoring AI agents, emphasizing the importance of adapting them to specific workflows. From simple prompt changes to advanced methods like reinforcement learning, each approach has its own pros and cons. Learn how to make your AI...
Source: Nvidia Developer Blog
Edward Li
2026-05-20 16:00
Enhance your agent harness capabilities with specialized deep research skills! 🛠️ Agent harnesses like Claude Code, Codex, and LangChain Deep Agents excel at managing sessions and executing tasks, but deep research can complicate workflows. 🌐 NVIDIA introduces the AI-Q skill, allowing agents to delegate research tasks to a local AI-Q server. This keeps sensitive data secure while producing structured, well-cited reports. 📊 Explore how this skill streamlines workflows without needing to...
Source: Nvidia Developer Blog
William Markito Oliveira
2026-05-19 23:40
NVIDIA is enhancing the capabilities of autonomous AI agents through verified skills. These skills ensure transparency and trust by detailing their origins, risks, and modifications. This means developers can confidently extend their agents in real workflows. 🛠️✨ NVIDIA agent skills are portable instruction sets that guide AI agents in using NVIDIA tools effectively. They come with documentation and regular updates to ensure reliability. Learn more about how these verified skills can improve...
Source: Nvidia Developer Blog
Moshe Abramovitch
2026-05-19 20:00
Evaluating AI models and agents serves different purposes. Model evaluation tests a foundation model's capabilities, focusing on static tasks and predefined inputs. Benchmarks like MMLU and GSM8K are commonly used to measure performance. In contrast, agent evaluation examines a system's behavior in dynamic environments, assessing its planning and tool usage. This article outlines key differences and offers tips for effective AI agent evaluation. 🤖📊 #AIEvaluation #MachineLearning #TechInsights...
Source: Nvidia Developer Blog
Edward Li
2026-05-19 18:00
🚀 Real-time visibility into GPU usage is crucial for maximizing AI infrastructure. Many teams face challenges due to limited insights into GPU consumption on Kubernetes. The new GPU Usage Monitor, built on NVIDIA's DCGM Exporter, provides comprehensive tracking of GPU allocation, memory use, and pod status. It simplifies monitoring with a single Helm chart deployment. This tool addresses common issues like over-provisioning and pod starvation, enabling better resource utilization and timely...
Source: Nvidia Developer Blog
Guy Saltoun
2026-05-14 19:24
NVIDIA's Vera Rubin Platform addresses the challenges of agentic AI's scale-up problem. Agentic inference introduces non-deterministic trajectories, affecting latency across inference requests. The Vera Rubin NVL72 serves as a core compute engine, optimizing for low-latency and high-throughput demands. This platform is the first to economically handle complex multi-agent workloads with high model capability. It combines extreme co-design for enhanced performance in AI services. Discover how...
Source: Nvidia Developer Blog
Graham Steele
2026-05-13 16:39
🚀 New advancements in X-ray technology are revolutionizing materials science! The X-ray free-electron laser (XFEL) tracks structural and electron dynamics in materials like semiconductors and catalysts. With ultrashort X-ray pulses, it captures atomic movements and identifies defects. The Accelerated X-ray Analysis for Nanoscale Imaging (XANI) workflow has significantly reduced data processing time from nine months to under four hours, utilizing NVIDIA's powerful computing technology. These...
Source: Nvidia Developer Blog
Irina Demeshko
2026-05-13 15:00
Unlock the power of video with NVIDIA's Metropolis Blueprint for Video Search and Summarization (VSS). 📹✨ VSS transforms vast amounts of video into searchable, actionable insights, making it easier for organizations to monitor operations and detect trends in real time. Discover how to automate deployment and integrate VSS into your applications. Join us live on May 13 at 9 am PT to learn more! #VideoAnalytics #NVIDIA #AI #DataIntelligence #Innovation
Source: Nvidia Developer Blog
Samuel Ochoa
2026-05-12 18:00
🚀 The journey from AI model training to production often faces challenges known as pipeline friction. These issues can lead to inefficiencies, increased costs, and performance degradation. 🛠️ Common sources include model export problems, unsupported operations, dynamic input sizes, and version mismatches. Addressing these can streamline deployments and improve API response times. 📊 The article highlights best practices, such as validating exports early and using ONNX operator versioning...
Source: Nvidia Developer Blog
Lovina Dmello
2026-05-11 19:44
🚀 NVIDIA introduces Fleet Intelligence, a new service for real-time monitoring of GPU fleets. This tool addresses the complexities of managing large GPU clusters, enhancing visibility into power, temperature, performance, and health. It aims to optimize resource utilization and ensure consistent performance across systems. Fleet Intelligence is deployment-agnostic and suitable for data center GPU and CPU management. #NVIDIA #GPUMonitoring #FleetIntelligence #DataCenter #TechInnovation
Source: Nvidia Developer Blog
Christian Shrauder
2026-05-08 17:13
🚀 Exciting advancements in AI! A recent study explores enhancing Bash command generation in small language models using grammar-constrained decoding. This method aims to improve reliability for executing tasks in agentic systems. The research found that by applying this technique, the average success rate of command generation increased significantly, from 62.5% to 75.2%. This development could broaden the deployment of small models in various environments, addressing the challenges of syntax...
Source: Nvidia Developer Blog
Joseph Lucas
2026-05-08 15:59
📢 Exploring structured interactions in AI, the article discusses the importance of agentic exchanges in NVIDIA Dynamo. It highlights how assistant turns intertwine reasoning with tool calls, ensuring a seamless user experience. Key improvements were made in parser and API coverage to enhance streaming behavior and performance. The focus remains on correctness and user experience as agentic harnesses evolve rapidly. #AI #NVIDIA #Dynamo #AgenticExchange #TechInnovation
Source: Nvidia Developer Blog
Matej Kosec
2026-05-07 21:20
🚀 The NVIDIA GB200 NVL72 revolutionizes GPU cluster design by extending NVLink coherence across an entire rack. This innovation enables exascale performance but alters existing scheduling assumptions. 🔧 To tackle the challenges of rack-scale locality, the Slurm workload manager has introduced the topology/block plugin. This allows for more precise application-specific NVLink requirements. 📈 The article details how to configure these features to enhance performance and optimize workload...
Source: Nvidia Developer Blog
Felix Abecassis
2026-05-07 21:18
🚀 Model quantization is a key method for reducing VRAM usage and enhancing inference performance on NVIDIA GeForce RTX GPUs. This article details how to utilize the NVIDIA Model Optimizer to quantize a CLIP model in FP8 format using the post-training quantization (PTQ) method. The NVIDIA Model Optimizer offers advanced techniques like quantization and pruning, supporting various model formats such as Hugging Face and PyTorch. 💡 CLIP, a foundation model from OpenAI, effectively aligns images...
Source: Nvidia Developer Blog
Ruixiang Wang
2026-05-07 16:02
🚀 Distributed deep learning relies on efficient GPU-to-GPU communication via the NVIDIA Collective Communication Library (NCCL). When training slows, pinpointing issues can be complex. The NCCL Inspector enhances this process by providing continuous performance reports, tracking operation type, size, and bandwidth. With the new real-time monitoring feature integrated with Prometheus, users can access live visualizations directly in their infrastructure dashboard. This marks a significant step...
Source: Nvidia Developer Blog
Ava Arnaz
2026-04-29 16:41
AI factories are shaping the future of enterprise productivity. These systems leverage agentic AI for reasoning, automation, and real-time decision-making. Success relies on robust infrastructure that ensures scalability and performance, transitioning from pilot to production smoothly. NVIDIA's Enterprise Reference Architectures offer the necessary guidance for building this foundation, minimizing integration risks and deployment time. These architectures enable organizations to scale AI...
Source: Nvidia Developer Blog
Shashank Sabhlok
2026-04-28 19:00
🚀 Exciting advancements in computational biology! NVIDIA BioNeMo has introduced a new context parallelism (CP) framework that allows holistic modeling of large biomolecular systems without the memory constraints of traditional GPUs. This innovation addresses the limitations of prior reductionist methods that often sacrificed global structural accuracy. The article details how to implement CP in biomolecular architectures, focusing on the need for familiarity with geometric deep learning...
Source: Nvidia Developer Blog
Dejun Lin
2026-04-28 16:01
🚀 Introducing the NVIDIA Nemotron 3 Nano Omni! This innovative model unifies multimodal reasoning, allowing agents to seamlessly process visual, audio, and textual inputs in one effective system. It simplifies orchestration, reduces costs, and enhances context consistency. Nemotron 3 Nano Omni excels in document intelligence and video/audio understanding, achieving top scores in industry benchmarks. Built on a 30B‑A3B hybrid architecture, it supports high throughput and customizable...
Source: Nvidia Developer Blog
Anjali Shah
2026-04-28 15:00
The subsurface industry is experiencing a significant digital transformation. Traditionally, unlocking reservoir potential relied on manual workflows, which have become a bottleneck due to increasing data complexity. Agentic AI offers a solution by automating repetitive tasks, allowing engineers to focus on strategic oversight. This shift can reduce project delays and enhance simulation efficiency. The framework discussed is applicable across industries, promoting faster, more effective...
Source: Nvidia Developer Blog
Tsubasa Onishi
2026-04-24 23:29
🚀 DeepSeek has launched its fourth-generation models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. These models are aimed at enhancing million-token context inference. 🧠 The V4-Pro features 1.6 trillion total parameters, while the V4-Flash offers 284 billion parameters for faster, more efficient tasks. Both support a 1M-token context window, ideal for complex coding and document analysis. 🔧 Architectural improvements in the V4 family result in significant reductions in inference costs, making it a...
Source: Nvidia Developer Blog
Anu Srivastava
2026-04-24 15:00
Federated learning (FL) is becoming essential as valuable data often cannot be moved due to regulations and risks. NVIDIA FLARE offers a solution by allowing training to occur where the data resides, addressing privacy and compliance concerns. The updated API simplifies the developer experience, letting teams transform local scripts into federated clients with minimal code changes. Key features include no data copying and strong governance controls. #FederatedLearning #NVIDIA #DataPrivacy...
Source: Nvidia Developer Blog
Holger Roth
2026-04-23 20:15
In March 2026, a team utilized three LLM agents to generate over 600,000 lines of code, achieving first place in a Kaggle competition on telecom customer churn prediction. 🚀 These agents significantly accelerated the coding and experimentation process, addressing key bottlenecks in machine learning. GPU technologies also played a vital role in this success. The winning solution featured a complex stack of 150 models selected from 850 experiments. 📊 #Kaggle #DataScience #MachineLearning...
Source: Nvidia Developer Blog
Chris Deotte
2026-04-22 23:50
🚀 Exciting news for deep learning developers! The Universal Sparse Tensor (UST) is now integrated into nvmath-python v0.9.0, enhancing flexibility and performance for sparse scientific applications. Key features include zero-cost interoperability with PyTorch, custom formats for sparsity schemes, and transparent caching to improve efficiency. Explore how UST can optimize your existing models and streamline your coding process! #DeepLearning #SparseTensor #nvmath #UST #Python
Source: Nvidia Developer Blog
Aart J.C. Bik
2026-04-22 20:30
AI is transforming enterprise applications, demanding a shift in modern data centers. 🚀 The NVIDIA RTX PRO 4500 Blackwell Server Edition and vGPU 20 tackle the challenge of dedicated GPU access. With Multi-Instance GPU (MIG) technology, a single GPU can be partitioned into independent instances, allowing multiple developers to work without resource conflicts. The integration of these technologies boosts performance for varying workloads, from productivity tools to AI development. This post...
Source: Nvidia Developer Blog
Phoebe Lee
2026-04-22 20:01
NVIDIA explores the impact of higher-order optimization algorithms like Shampoo and Muon in training large language models (LLMs). Recent findings show that Muon has been successfully utilized for models such as Kimi K2 and GLM-5, demonstrating comparable training performance to AdamW on NVIDIA systems. The research highlights the efficiency of using NVIDIA NeMo Megatron Bridge for enhanced training throughput. For more details on experimental settings, check the article! #AI #NVIDIA...
Source: Nvidia Developer Blog
Hao Wu
2026-04-20 23:01
The rise of open source generative AI models is transforming how we deploy technology in the physical world. Developers are keen to implement these models on edge devices for tasks like automation in robotics. 🤖 A significant challenge lies in efficiently running large models on devices with limited memory. The NVIDIA Jetson platform is designed to optimize memory use, enhancing performance while managing resource constraints. This article discusses strategies for maximizing efficiency in...
Source: Nvidia Developer Blog
Anshuman Bhat
2026-04-20 22:52
Reinforcement learning (RL) is crucial as large language models (LLMs) evolve from basic text generation to complex reasoning. Algorithms like Group Relative Policy Optimization (GRPO) enhance model improvement through iterative feedback. RL training involves two phases: a latency-sensitive generation phase and a high-throughput training phase. Researchers are utilizing low-precision data types, such as FP8, to improve performance. This approach can enhance efficiency, especially in scenarios...
Source: Nvidia Developer Blog
Guyue Huang
2026-04-20 17:00
AI tools are transforming software development, acting as real-time copilots to automate tasks like code generation and debugging. However, recent findings by the NVIDIA AI Red Team reveal vulnerabilities in these tools, particularly through indirect AGENTS.md injection attacks via compromised dependencies. This highlights new supply chain risks in development environments. The article outlines the attack process and offers strategies for mitigating these risks, emphasizing the importance of...
Source: Nvidia Developer Blog
Daniel Teixeira
2026-04-17 22:52
Coding agents are transforming software development by generating production code at scale. Stripe’s agents produce over 1,300 pull requests (PRs) weekly, while Ramp sees 30% of merged PRs attributed to agents. Spotify reports 650+ agent-generated PRs monthly. Tools like Claude Code and Codex handle numerous API calls during coding sessions, ensuring efficient workflows. #CodingAgents #SoftwareDevelopment #AI #TechInnovation #NVIDIA
Source: Nvidia Developer Blog
Ishan Dhanani
2026-04-17 18:59
Unlock the potential of AI with NVIDIA NemoClaw! 🤖 Agents are transitioning from simple Q&A systems to advanced autonomous assistants. However, deploying them requires careful attention to data privacy and control. NVIDIA NemoClaw offers an open-source solution to build secure, long-running AI agents. This tutorial guides you through deploying NemoClaw on NVIDIA DGX Spark, connecting it to Telegram for easy access. Explore how to create your own local AI assistant today! 📱🔒 #AI #NVIDIA...
Source: Nvidia Developer Blog
Patrick Moorhead
2026-04-17 15:00
The development of safe and efficient nuclear reactors is gaining momentum, focusing on Small Modular Reactors (SMRs) and Generation IV designs. SMRs aim to standardize designs and enhance project economics, while Gen IV reactors address fuel-cycle challenges and waste management. To streamline the design process, engineers are using digital twins and AI simulations, reducing costs and time significantly. Tools like NVIDIA's CUDA-X and PhysicsNeMo are key in this innovation journey....
Source: Nvidia Developer Blog
Mark Hobbs
2026-04-16 15:00
🚀 Building real-time vision AI applications can be challenging, requiring complex data pipelines and extensive coding. NVIDIA DeepStream 9 simplifies this process with coding agents like Claude Code and Cursor, enabling developers to create optimized code efficiently. This platform supports multi-camera setups to process vast amounts of video, audio, and sensor data, accelerating insights across various industries. Join the live session on April 16 at 9am PT to learn more! 📅 #VisionAI...
Source: Nvidia Developer Blog
Debraj Sinha
2026-04-14 16:30
NVIDIA introduces the ALCHEMI Toolkit, aimed at enhancing computational chemistry and materials science. 🌐 This toolkit combines GPU-accelerated building blocks with AI to streamline atomistic simulations. It addresses the speed and accuracy challenges faced by traditional methods, such as DFT and classical force fields. ⚛️ Key features include scalable microservices and foundational GPU kernels, promoting modular workflows for researchers. This development supports tasks like geometry...
Source: Nvidia Developer Blog
Erica Tsai
2026-04-14 16:00
Discover NVIDIA NVbandwidth, a vital tool for CUDA developers focused on GPU data transfer performance. 🖥️ This tool measures bandwidth and latency for various memory copy patterns, helping users evaluate system performance, diagnose bottlenecks, and optimize workloads. Key features include comprehensive testing for unidirectional, bidirectional, and multi-GPU configurations. Learn more about enhancing your GPU setups! 🚀💡 #NVIDIA #CUDA #GPUPerformance #DataTransfer #TechTools
Source: Nvidia Developer Blog
Eva Sitaridi
2026-04-14 14:15
🚀 NVIDIA has launched Ising, the first family of open AI models designed for building quantum processors. 🛠️ It features two key models: Ising Calibration, which automates QPU calibration tasks, and Ising Decoding, utilizing advanced 3D CNNs for error correction. 🔍 These models aim to tackle noise in quantum computing, enhancing performance and reducing error rates significantly. Learn more about how Ising supports error correction and scaling to Quantum-GPU supercomputers. #NVIDIA...
Source: Nvidia Developer Blog
Tom Lubowe
2026-04-12 01:02
🚀 The MiniMax M2.7 has been released, enhancing the MiniMax M2.5 model for complex AI applications across various fields, including ML research and software engineering. This model features a sparse mixture-of-experts design, maintaining low inference costs while leveraging a 230B-parameter architecture. It utilizes advanced techniques like Rotary Position Embeddings and a top-k expert routing mechanism for optimal performance. Additionally, NVIDIA introduces NemoClaw, a tool for safely...
Source: Nvidia Developer Blog
Anu Srivastava
2026-04-09 17:00
Unlocking the power of GPU workloads on Kubernetes is now possible with Slurm integration. 🌐 Slurm, a leading job scheduling system, manages over 65% of TOP500 systems. The challenge lies in integrating its capabilities into Kubernetes without duplicating environments. The Slinky project offers two solutions: the slurm-bridge for native Kubernetes workloads and the slurm-operator for running full Slurm clusters. This post highlights the slurm-operator, detailing its architecture, deployment,...
Source: Nvidia Developer Blog
Anton Polyakov