Articles from Source: Nvidia-Developer-Blog

Building Scalable and Fault-Tolerant NCCL Applications

2025-11-10 21:29
🚀 The NVIDIA Collective Communications Library (NCCL) enhances AI workloads by enabling communication across multiple GPUs, scaling from a few to thousands. 💡 Key features include run-time rescaling for cost optimization and fault tolerance, allowing dynamic removal of faulty workers. NCCL supports complex workflows, utilizing data and tensor parallelism to meet performance goals. As model sizes grow, dynamic resource allocation becomes essential for efficiency. #NCCL #AI #GPUComputing...
Source: Nvidia Developer Blog
Luke Robison

Training XGBoost Models with GPU-Accelerated Polars DataFrames

2025-11-10 19:30
🚀 The latest release of XGBoost enhances interoperability within the PyData ecosystem, now integrating with Polars DataFrames for streamlined data handling. This article details how to utilize Polars' GPU engine with XGBoost, focusing on lazy evaluation and the new category re-coder. 📊 Setting up requires installing the necessary libraries, including GPU-enabled Polars. This setup optimizes workflows and improves performance in model training. 📈 #DataScience #MachineLearning #XGBoost #Polars...
Source: Nvidia Developer Blog
Jiaming Yuan

Gen AI Super-Resolution Accelerates Weather Prediction with Scalable, Low-Compute Models

2025-11-10 19:29
NVIDIA's Earth-2 platform is transforming weather predictions with AI-driven solutions. 🌍 The platform utilizes CorrDiff, a generative AI model that enhances downscaling of weather data, making it more efficient and cost-effective. This advancement allows national meteorological services to deliver high-resolution forecasts essential for agriculture, energy, and disaster preparedness. 🌾⚡ With optimizations achieving over 50x speedup in training and inference, CorrDiff enables scalable, fine-...
Source: Nvidia Developer Blog
Alicia Sui

How to Achieve 4x Faster Inference for Math Problem Solving

2025-11-10 16:44
Unlock the potential of large language models for math problem solving! This article outlines how to achieve 4x faster inference using the NVIDIA NeMo-Skills library and TensorRT-LLM. Key steps include preparing an OpenMath model, integrating ReDrafter for decoding, and launching an optimized inference server. It’s designed for those with access to NVIDIA H100 GPUs or similar setups. #AI #NVIDIA #MachineLearning #Math #TechInsights
Source: Nvidia Developer Blog
Igor Gitman

Enabling Multi-Node NVLink on Kubernetes for NVIDIA GB200 NVL72 and Beyond

2025-11-10 14:00
🚀 The NVIDIA GB200 NVL72 advances AI infrastructure, enhancing training for large-language models and low-latency inference workloads. Kubernetes is essential for efficiently deploying these evolving workloads. However, challenges arise in orchestration and resource management. Introducing ComputeDomains: a new Kubernetes abstraction that simplifies GPU-to-GPU memory operations across multi-node NVLink setups, ensuring flexibility and security. Learn more about how ComputeDomains can support...
Source: Nvidia Developer Blog
Kevin Klues

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

2025-11-10 14:00
🚀 AI inference has advanced from simple single-model deployments to complex multicomponent systems. These now include various components like prefill, decode, and vision encoders. NVIDIA Grove addresses orchestration challenges by allowing users to manage entire inference systems as a single resource on Kubernetes. It supports scaling from single replicas to tens of thousands of GPUs. With Grove, developers can efficiently control component interactions, ensuring precise autoscaling and...
Source: Nvidia Developer Blog
Sanjay Chatterjee

Building an Interactive AI Agent for Lightning-Fast Machine Learning Tasks

2025-11-07 17:44
🚀 Data scientists often face challenges in preparing large datasets, which can slow down machine learning tasks. A new interactive AI agent has been prototyped to simplify this process. Using NVIDIA's GPU acceleration, the agent helps translate user intent into optimized workflows, allowing for faster exploration and analysis of data. Developers can interact with the agent through natural language, speeding up tasks from data processing to model evaluation. Explore the architecture and...
Source: Nvidia Developer Blog
Allison Ding

Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

2025-11-07 16:30
🚀 New insights on AI coding assistants! The latest update of ComputeEval has expanded CUDA challenges to 232, adding over 100 new problems that test advanced features like Tensor Cores and CUDA Graphs. This aims to elevate AI performance in CUDA programming. Recent evaluations show that scores for leading LLMs have declined, but this reflects the increased difficulty of the benchmark, not a drop in capability. The team plans to further extend dataset coverage and invites collaboration from...
Source: Nvidia Developer Blog
Daniel Rodriguez

Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS

2025-11-06 20:41
Unlock faster data processing with NVIDIA cuVS and Meta Faiss! 🚀 As businesses handle more unstructured data, traditional systems struggle to keep up. cuVS enhances vector search efficiency, allowing for quicker index creation and searches. Key benefits include: - Up to 12x faster index building on GPU - 8x lower search latencies - Seamless index transfer between CPU and GPU 🌐 Explore the advancements in GPU-accelerated search! #NVIDIA #Faiss #DataProcessing #AI #MachineLearning
Source: Nvidia Developer Blog
Tarang Jain

Accelerating Large-Scale Mixture-of-Experts Training in PyTorch

2025-11-06 17:00
🚀 Exciting advancements in AI training! NVIDIA NeMo Automodel simplifies large-scale mixture-of-experts (MoE) training in PyTorch. Developers can now train billion-parameter models without complex setups. This open-source library allows scaling from 8 to over 1,000 GPUs efficiently, making powerful MoE architectures accessible to all. Discover the benefits and a quick-start guide to enhance your experiments! #NVIDIA #MachineLearning #PyTorch #AI #MoE
Source: Nvidia Developer Blog
Hemil Desai

How to Predict Biomolecular Structures Using the OpenFold3 NIM

2025-11-04 18:00
Unlock the mystery of biomolecular structures with OpenFold3! 🧬 Deep learning has revolutionized how we predict protein folding, moving us closer to understanding biological architecture. OpenFold3 now extends this capability to multi-chain complexes and small molecules. Powered by NVIDIA, this tool offers rapid sequence search and privacy-preserving collaboration. 🌐✨ Ready to get started? Check out the OpenFold3 API demo and access the source code today! #BiomolecularScience #DeepLearning...
Source: Nvidia Developer Blog
Kyle Tretina

R²D²: Perception-Guided Task & Motion Planning for Long-Horizon Manipulation

2025-11-04 17:00
🚀 New advancements in robot manipulation are explored in the latest edition of NVIDIA's R²D². Traditional task and motion planning (TAMP) often struggles in new environments. The integration of perception allows robots to adapt plans in real-time, enhancing their capabilities. Key concepts include subgoals, affordances, and differentiable constraints, which help robots navigate complex tasks effectively. Innovative frameworks like OWL-TAMP and VLM-TAMP are highlighted, using vision and...
Source: Nvidia Developer Blog
Raffaello Bonghi

Make Sense of Video Analytics by Integrating NVIDIA AI Blueprints

2025-11-03 21:48
Unlock insights from video and audio data with NVIDIA's integrated approach! 📹🔍 The article discusses how combining the Video Search and Summarization (VSS) and Retrieval-Augmented Generation (RAG) AI Blueprints enhances video analytics. This integration allows for richer insights by incorporating enterprise context into video workflows. Learn how to create scalable systems for real-time video Q&A and apply these solutions in various industries. #NVIDIA #VideoAnalytics #AI #DataInsights...
Source: Nvidia Developer Blog
Ilyas Bankole-Hameed

Join Us for the Blackwell NVFP4 Kernel Hackathon with NVIDIA and GPU MODE

2025-11-03 20:00
🌟 Join the Blackwell NVFP4 Kernel Hackathon! 🌟 This four-part performance challenge is hosted by NVIDIA in collaboration with GPU MODE, with support from Dell and Sesterce. Developers can showcase their skills and push performance limits. For event details and inquiries, reach out to the NVIDIA Developer Community Team. #Hackathon #NVIDIA #GPU #DeveloperCommunity #PerformanceChallenge
Source: Nvidia Developer Blog
Ayesha Asif

Advancing Explainable AI in Radiology Research with NVIDIA Clara Reason

2025-11-03 18:02
🚀 Medical AI is evolving! NVIDIA Clara is advancing explainable AI in radiology by introducing Clara Reason. This innovative approach mirrors radiologists' thought processes, enabling step-by-step diagnostic reasoning with transparent explanations. 🩻 Clara NV-Reason-CXR-3B specializes in chest x-ray analysis, addressing the trust barrier in AI-assisted diagnoses. Learn how this model combines multimodal data and structured reasoning to enhance clinical decision-making. #MedicalAI #Radiology...
Source: Nvidia Developer Blog
Andriy Myronenko

How Code Execution Drives Key Risks in Agentic AI Systems

2025-11-03 17:54
AI-driven applications are shifting from passive tools to agentic systems capable of generating code and making decisions. This evolution presents significant security risks, especially concerning code execution. Strict controls are necessary to prevent malicious actors from exploiting AI-generated code. Traditional defenses, like sanitization, may not be sufficient as attackers can craft inputs to bypass these measures. The NVIDIA AI red team highlights the importance of treating LLM-...
Source: Nvidia Developer Blog
John Irwin

Streamline AI Infrastructure with NVIDIA Run:ai on Microsoft Azure

2025-10-30 17:10
Transform your AI infrastructure with NVIDIA Run:ai on Microsoft Azure! 🌐 This platform enhances GPU resource management in Kubernetes environments, addressing key challenges like inefficient utilization and governance. Key features include fractional GPU allocation, dynamic scheduling, and team-based quotas, all designed to optimize AI workloads. Learn how to streamline your AI operations for better performance and efficiency. 💻🚀 #AI #NVIDIA #MicrosoftAzure #Kubernetes #CloudComputing
Source: Nvidia Developer Blog
Julie Adrounie

Introducing the CodonFM Open Model for RNA Design and Analysis

2025-10-28 20:00
🚀 Exciting news from NVIDIA! They've introduced CodonFM, a new RNA language model as part of the Clara open model family. This model understands RNA by treating codons as words, enhancing its ability to predict mRNA design and variant effects. CodonFM was trained on 131 million protein-coding sequences, allowing it to accurately interpret complex genetic patterns. Explore more about CodonFM and its applications for RNA analysis! #NVIDIA #CodonFM #OpenResearch #RNA #Bioinformatics
Source: Nvidia Developer Blog
Kyle Gion

Accelerating AV Simulation with Neural Reconstruction and World Foundation Models

2025-10-28 18:30
Autonomous vehicle (AV) technology is advancing towards integrated end-to-end architectures using foundation models. This shift emphasizes the need for a robust AV data flywheel to create synthetic data and enhance sensor datasets. NVIDIA has introduced tools such as the Omniverse and Cosmos workflows to support developers in building these data pipelines. Key features include access to real AV data, data processing tools, and libraries for neural reconstruction. With over 1,700 hours of...
Source: Nvidia Developer Blog
Gautham Sholingar

Powering AI-Native 6G Research with the NVIDIA Sionna Research Kit

2025-10-28 17:51
Unlock the future of wireless communication with the NVIDIA Sionna Research Kit! 📡✨ This open-source platform is designed for 6G research, enabling rapid prototyping and real-time simulations. With over 540 publications referencing Sionna, it’s paving the way for innovation in AI and ML applications. The Sionna Research Kit runs on the NVIDIA DGX Spark, offering a fully open platform for wireless research. Researchers can experiment across the entire telecommunications stack, promoting...
Source: Nvidia Developer Blog
Sebastian Cammerer

Develop Specialized AI Agents with New NVIDIA Nemotron Vision, RAG, and Guardrail Models

2025-10-28 17:32
🚀 NVIDIA introduces new Nemotron models aimed at enhancing Agentic AI. These specialized language and vision models enable effective planning, reasoning, and retrieval. Developers can now access open models and robust datasets to build AI agents for specific workflows, ensuring compliance and real-world deployment. Learn more about the features, performance, and tutorials for creating multimodal agents and RAG pipelines with a focus on content safety. #NVIDIA #AI #TechInnovation #AgenticAI...
Source: Nvidia Developer Blog
Chris Alexiuk

Build Synthetic Data Pipelines to Train Smarter Robots with NVIDIA Isaac Sim

2025-10-24 19:42
Unlock the potential of robotics with synthetic data pipelines! 🤖 As robots tackle complex mobility tasks, developers require accurate simulations. NVIDIA Isaac Sim provides a solution by generating high-quality synthetic data, reducing the time and cost of real-world data collection. Key points include: - Creating simulated environments with NVIDIA Omniverse NuRec. - Utilizing SimReady assets for streamlined simulations. - Generating and augmenting synthetic data using MobilityGen and NVIDIA...
Source: Nvidia Developer Blog
Asawaree Bhide

Unlocking Tensor Core Performance with Floating Point Emulation in cuBLAS

2025-10-24 16:21
🚀 NVIDIA's latest cuBLAS update in CUDA Toolkit 13.0 Update 2 enhances double-precision (FP64) matrix multiplications through floating-point emulation on Tensor Cores. This update offers improved performance for both FP32 and FP64 operations, ensuring accuracy while maximizing efficiency. Developers can access Tensor Core performance easily via familiar APIs. For detailed GPU compatibility and implementation specifics, refer to the cuBLAS documentation. #NVIDIA #CUDA #cuBLAS #AI #MachineLearning
Source: Nvidia Developer Blog
Cole Brower

How NVIDIA DGX Spark’s Performance Enables Intensive AI Tasks

2025-10-24 16:00
NVIDIA DGX Spark is designed to meet the needs of AI developers requiring high memory and powerful computing without relying on cloud resources. This compact supercomputer offers 1 petaflop of FP4 AI performance and 128 GB of coherent memory, making it suitable for intensive tasks like fine-tuning and image generation. Benchmark tests show impressive performance in fine-tuning models, with peak speeds of over 82,000 tokens per second. Additionally, it supports high-resolution image...
Source: Nvidia Developer Blog
Allen Bourgoyne

Solve Linear Programs Using the GPU-Accelerated Barrier Method in NVIDIA cuOpt

2025-10-24 16:00
Unlock the power of optimization with NVIDIA cuOpt! This open-source library enhances problem-solving for complex scenarios like sports scheduling and medical transplants. The latest update features a new barrier method solver for linear programs, offering significant speed improvements. Benchmarks show over 8x faster performance compared to leading CPU solvers. Discover how cuOpt is transforming optimization tasks! #Optimization #NVIDIA #cuOpt #LinearProgramming #TechInnovation 🚀📈💻
Source: Nvidia Developer Blog
Christopher Maes

Reconstruct a Scene in NVIDIA Isaac Sim Using Only a Smartphone

2025-10-23 23:06
Transforming 3D environments for robotics simulation is now easier with NVIDIA Omniverse NuRec! 📱✨ Using just a smartphone, you can capture real-world scenes and create realistic 3D models. The process involves taking photos, generating a sparse reconstruction with COLMAP, and loading your scene into NVIDIA Isaac Sim. For detailed steps, including tips on capturing the best photos, check out the full article! #NVIDIA #3DModeling #Robotics #IsaacSim #Omniverse
Source: Nvidia Developer Blog
Wonsik Han

Train an LLM on an NVIDIA Blackwell Desktop with Unsloth—and Scale It

2025-10-23 17:51
Unlock the potential of large language models (LLMs) with the Unsloth framework! 🌟 Unsloth simplifies fine-tuning and reinforcement learning, making it accessible for individuals and small teams. It pairs seamlessly with NVIDIA Blackwell GPUs, enhancing training speed and efficiency. With benchmarks showing 2x faster training and 70% less VRAM usage, LLM customization is now within reach! 🎉 Explore how to train custom LLMs locally and scale to cloud instances for production workloads. #AI...
Source: Nvidia Developer Blog
Paul Abruzzo

Bring Your Circuits to CUDA-Q Using QGEAR

2025-10-23 16:55
🚀 Exciting news for developers! You can now import Qiskit circuits into GPU-accelerated CUDA-Q kernels with NERSC’s QGEAR project. This makes integrating quantum circuits smoother and more efficient. To get started, simply install using: ```bash pip install qgear-lightning ``` Explore the future of quantum computing! 🖥️✨ #QuantumComputing #QGEAR #CUDAQ #Qiskit #NERSC
Source: Nvidia Developer Blog
Shara Tibken

Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour

2025-10-22 15:00
Unlock the power of voice with NVIDIA Nemotron Nano v2! 🎤💻 This article guides you on creating a natural language Bash agent that executes commands without manual input. In just under an hour and about 200 lines of Python code, you can build this tool from scratch. It also introduces LangGraph for simplifying design. Ready to get started? #NVIDIA #BashAgent #NaturalLanguageProcessing #Python #TechInnovation
Source: Nvidia Developer Blog
Mehran Maghoumi

Build Practical Deep-Learning Skills for Real-World AI Applications with the New NVIDIA Learning Path

2025-10-21 20:05
🌟 Enhance your deep-learning skills with NVIDIA's new learning path! Explore a variety of courses, workshops, and certifications designed for real-world AI applications. Visit the learning path page to sign up and start your journey in developing practical expertise. #NVIDIA #DeepLearning #AI #LearningPath #SkillDevelopment
Source: Nvidia Developer Blog
Shara Tibken

NVIDIA ACE Adds Open Source Qwen3 SLM for On-Device Deployment in PC Games

2025-10-21 17:00
🚀 NVIDIA ACE now supports the open-source Qwen3-8B small language model for on-device NPC character development. This integration allows for real-time reasoning and dynamic responses in gaming, enhancing player interaction. The IGI SDK plugin also includes updates for multilingual text-to-speech capabilities and improved performance features. Developers can check out the latest tools and resources to elevate their gaming projects. 🎮✨ #NVIDIA #Gaming #AI #GameDevelopment #Qwen3
Source: Nvidia Developer Blog
Ike Nnoli

Build an AI Agent to Analyze IT Tickets with NVIDIA Nemotron

2025-10-20 17:00
Unlock insights from IT tickets with NVIDIA's AI agent! 🖥️ Modern organizations face challenges in analyzing vast amounts of data from ticketing systems. Traditional platforms often fall short in providing actionable insights. NVIDIA's ITelligence combines AI reasoning and graph databases to reveal hidden patterns in support ticket data. This approach helps identify recurring issues and team performance gaps effectively. The architecture is adaptable, suitable for various domains beyond IT...
Source: Nvidia Developer Blog
Bhaskar Bhowmik

Enabling Scalable AI-Driven Molecular Dynamics Simulations

2025-10-20 16:30
Unlock the power of scalable molecular dynamics (MD) simulations! 🔬✨ This article explores how integrating PyTorch-based machine learning interatomic potentials (MLIPs) with the LAMMPS MD package enhances simulation efficiency and accuracy. The ML-IAP-Kokkos interface, developed by NVIDIA and national labs, streamlines this integration for researchers. It supports efficient data transfer between GPUs, enabling large-scale simulations. Ready to get started? The article provides a step-by-step...
Source: Nvidia Developer Blog
Justin S. Smith

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

2025-10-20 16:00
Modern AI workloads are evolving beyond single-GPU setups. Model parallelism is now key for scalable deployments, especially with mixture-of-experts (MoE) architectures, which activate only a portion of parameters per token. Expert parallelism (EP) is crucial for managing the complexities of scaling these models. With tools like NVIDIA Tensor RT-LLM’s Wide Expert Parallelism, large-scale deployments become more efficient, enhancing performance and cost-effectiveness. Learn how large-scale EP...
Source: Nvidia Developer Blog
Eduardo Alvarez

NVIDIA Blackwell Leads on SemiAnalysis InferenceMAX v1 Benchmarks

2025-10-16 17:33
SemiAnalysis has launched InferenceMAX™ v1, an open-source initiative for evaluating inference hardware performance. The results show NVIDIA GPUs, particularly the Blackwell model, achieving a 15x performance increase over the previous Hopper generation. This advancement is attributed to innovative hardware-software designs. The AI community is encouraged to utilize InferenceMAX v1 to validate NVIDIA's performance across various inference scenarios. #NVIDIA #AI #InferenceMAX #TechInnovation...
Source: Nvidia Developer Blog
Farshad Ghodsian

Agentic AI Unleashed: Join the AWS & NVIDIA Hackathon

2025-10-15 19:39
🚀 Ready to innovate? Join the AWS & NVIDIA Hackathon and build the future of Agentic AI! Create an autonomous application using advanced AI models and scalable infrastructure. This event offers hands-on experience, networking opportunities, and the chance to win prizes. Teams can request $100 in promotional credits to support their projects. Be sure to monitor your usage to maximize your resources! #AI #Hackathon #AWS #NVIDIA #Innovation
Source: Nvidia Developer Blog
Rachel Ho

Unlock Faster, Smarter Edge Models with 7x Gen AI Performance on NVIDIA Jetson AGX Thor

2025-10-15 18:25
🚀 Exciting advancements in AI with NVIDIA Jetson AGX Thor! Launched in August, it has achieved a remarkable 7x boost in generative AI performance since its release. This improvement benefits various models, including Llama and DeepSeek. Continuous software updates support the latest AI models, enhancing developer experimentation. Jetson Thor also optimizes inference with major quantization formats and innovative techniques like speculative decoding. #NVIDIA #AI #MachineLearning #JetsonThor...
Source: Nvidia Developer Blog
Suhas Hariharapura Sheshadri

Accelerated and Distributed UPF for the Era of Agentic AI and 6G

2025-10-15 18:06
The telecommunications sector is rapidly advancing towards 6G, focusing on AI-native Radio Access Networks (AI-RAN) and AI-Core. A key development is the distributed User Plane Function (dUPF), which processes data closer to users, reducing latency and enhancing throughput. 📶 The article discusses the architectural benefits of dUPF, particularly for agentic AI applications. It showcases a reference implementation using NVIDIA DOCA Flow, which supports energy-efficient, low-latency operations...
Source: Nvidia Developer Blog
Yuyong Zhang

Accelerate Qubit Research with NVIDIA cuQuantum Integrations in QuTip and scQubits

2025-10-14 19:23
🚀 NVIDIA cuQuantum is now integrated into QuTip and scQubits, enhancing quantum simulations at both circuit and device levels. This integration allows researchers to design and study novel qubit types more efficiently. With a 4000x speedup on AWS, users can explore complex quantum systems effectively. QuTip and scQubits are now optimized for better performance and scalability, paving the way for future advancements in quantum computing. #QuantumComputing #NVIDIA #QuantumSimulations #QuTip...
Source: Nvidia Developer Blog
Tom Lubowe

Understanding Memory Management on Hardware-Coherent Platforms

2025-10-14 16:00
Discover how memory management affects application performance on hardware-coherent platforms. NVIDIA's Coherent Driver-based Memory Management (CDMM) mode offers improved control over GPU memory compared to the default NUMA mode. This allows applications to optimize memory placement for better performance. Learn about the implications for Kubernetes and more in the full article. 💻🚀 #NVIDIA #MemoryManagement #Kubernetes #TechInsights #PerformanceOptimization
Source: Nvidia Developer Blog
Kumar Sankaran