Articles from Source: Nvidia-Developer-Blog

Scaling AI Inference Performance and Flexibility with NVIDIA NVLink and NVLink Fusion

2025-08-21 15:00
The rise of AI model complexity has increased parameter counts from millions to trillions, demanding more computational power. 🌐 NVIDIA NVLink and NVLink Fusion are key technologies enhancing AI inference performance. They enable large-scale parallelization strategies, essential for handling advanced AI architectures like mixture-of-experts (MoE). 🤖 This evolution in AI systems highlights the need for interconnected GPUs acting as a unified pool of compute and memory. #AI #NVIDIA #NVLink...
Source: Nvidia Developer Blog
Joe DeLaere

Reinforcement Learning with NVIDIA NeMo-RL: Megatron-Core Support for Optimized Training Throughput

2025-08-20 15:15
🚀 Exciting updates in reinforcement learning with NVIDIA NeMo-RL! The latest release introduces support for the Megatron-Core library, enhancing training throughput for massive language models. This integration addresses limitations found in the PyTorch DTensor backend, particularly for models with hundreds of billions of parameters. With GPU-optimized techniques and simplified configuration options, NeMo-RL makes it easier for developers to harness the power of Megatron-Core. Explore...
Source: Nvidia Developer Blog
Anna Shors

Deploying Your Omniverse Kit Apps at Scale

2025-08-20 13:00
Unlock the potential of 3D applications with NVIDIA Omniverse Kit App Streaming! 🌐 This solution simplifies deployment and enables users to stream applications directly from their browsers, reducing the need for complex installations. With flexible options like self-managed deployment or fully-managed infrastructure, developers can easily reach their audience. Explore the straightforward steps to get started and enhance your 3D application experience! 💻🚀 #NVIDIA #Omniverse #3DStreaming...
Source: Nvidia Developer Blog
Ashley Goldstein

New Nemotron Nano 2 Open Reasoning Model Tops Leaderboard and Delivers 6x Higher Throughput

2025-08-19 20:50
🚀 Exciting news in AI! The NVIDIA Nemotron Nano 2 model has topped the leaderboard with impressive accuracy. This open reasoning model offers up to 6x higher throughput compared to its closest competitors, enhancing edge AI capabilities. Stay updated on advancements in technology! 🔍📈 #AI #NVIDIA #Innovation #EdgeComputing #Nemotron
Source: Nvidia Developer Blog
Chintan Patel

Announcing the Latest NVIDIA Gaming AI and Neural Rendering Technologies

2025-08-18 19:30
🚀 NVIDIA has made significant announcements at Gamescom 2025, introducing updates to its RTX neural rendering and ACE generative AI technologies. These advancements aim to enhance gaming experiences with expanded integration for DLSS 4, new AI models, and cloud solutions like GeForce NOW inside Discord. 🎮 Upcoming titles, including Resident Evil Requiem and Borderlands 4, will feature these technologies, helping developers optimize graphics even for players with older hardware. For Unreal...
Source: Nvidia Developer Blog
Ike Nnoli

Identify Speakers in Meetings, Calls, and Voice Apps in Real-Time with NVIDIA Streaming Sortformer

2025-08-18 16:00
Introducing NVIDIA Streaming Sortformer, a breakthrough in real-time speaker identification for meetings, calls, and voice-enabled apps. 🎤 This production-grade diarization model offers low-latency performance, making it ideal for multi-speaker environments. Key features include frame-level diarization, precision timestamps, and efficient GPU inference. 🌐 Optimized for English and tested with Mandarin and other languages, it promises robust tracking with minimal latency. #NVIDIA #AI...
Source: Nvidia Developer Blog
Ivan Medennikov

Scaling AI Factories with Co-Packaged Optics for Better Power Efficiency

2025-08-18 16:00
AI is reshaping the computing landscape, with networks becoming essential for future data centers. NVIDIA is leading this evolution with GPU-driven AI factories that require high bandwidth and low latency. Their networking solutions, including Spectrum-X Ethernet and Quantum InfiniBand, support these advanced needs. Co-packaged optics are now crucial for power efficiency and resilience in AI workloads, marking a significant shift from traditional data center designs. #NVIDIA #AI #DataCenters...
Source: Nvidia Developer Blog
Ashkan Seyedi

Upcoming Livestream: Building Cross-Framework Agent Ecosystems

2025-08-14 16:00
🚀 Join us for an insightful livestream on August 21! Discover how the NVIDIA NeMo Agent Toolkit enhances multi-agent workflows through deep MCP integration. 📅 Time: 18:00 - 19:00 (CEST) 📍 Learn about building optimized agentic systems with NVIDIA NIM. Don't miss this opportunity to expand your knowledge! #NVIDIA #Livestream #NeMoAgent #TechInnovation #AI
Source: Nvidia Developer Blog
Nicola Sessions

Streamline CUDA-Accelerated Python Install and Packaging Workflows with Wheel Variants

2025-08-13 22:00
🚀 NVIDIA addresses the challenges of installing GPU-accelerated Python packages with the WheelNext initiative. Current wheel formats struggle with hardware diversity, leading to installation complexities. WheelNext aims to improve this by introducing Wheel Variants, allowing precise artifact descriptions for better compatibility. This collaboration with Meta and others enhances user experience in scientific computing and AI. 🔗 Learn more: [GitHub repo link] #Python #NVIDIA #CUDA #OpenSource #AI
Source: Nvidia Developer Blog
Jonathan Dekhtiar

Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2

2025-08-13 21:33
🚀 Exciting advancements in AI with NVIDIA's ProRL v2! This new framework explores whether large language models (LLMs) can enhance their capabilities through extended reinforcement learning (RL). ProRL v2 incorporates advanced algorithms and rigorous training methods across multiple domains. Key features include over 3,000 RL steps, stability improvements, and fully verifiable rewards. These innovations aim to help models discover new solutions rather than just refining existing ones. #AI...
Source: Nvidia Developer Blog
Jian Hu

Streamlining Quantum Error Correction and Application Development with CUDA-QX 0.4

2025-08-13 16:00
🚀 Quantum computing is advancing with the latest release of CUDA-QX 0.4, focusing on quantum error correction (QEC). This update streamlines QEC experiments, enabling researchers to define and simulate codes, configure decoders, and deploy them effectively. Key features include a comprehensive API for user-defined components and the introduction of a detector error model (DEM) for improved circuit simulations. 🔗 Check out the full release notes on GitHub for ongoing development and...
Source: Nvidia Developer Blog
Shane Caldwell

Dynamo 0.4 Delivers 4x Faster Performance, SLO-Based Autoscaling, and Real-Time Observability

2025-08-13 15:30
🚀 Exciting advancements in AI! Dynamo 0.4 has been released, offering 4x faster performance and SLO-based autoscaling. This update is designed for deploying new open-source models like OpenAI's gpt-oss and Moonshot AI's Kimi K2 efficiently. Key features include enhanced real-time observability and resiliency, making it easier to monitor performance and manage requests. #AI #OpenSource #Dynamo #Innovation #TechUpdates
Source: Nvidia Developer Blog
Amr Elmeleegy

Announcing General Availability for NVIDIA Isaac Sim 5.0 and NVIDIA Isaac Lab 2.2

2025-08-11 15:00
🚀 NVIDIA has announced the general availability of NVIDIA Isaac Sim 5.0 and NVIDIA Isaac Lab 2.2 at SIGGRAPH 2025. These frameworks are now accessible on GitHub, providing developers with tools for building, training, and testing AI-powered robots in physics-based simulations. Explore the cutting-edge capabilities that these releases bring to robotics development. #NVIDIA #IsaacSim #Robotics #AI #SIGGRAPH2025
Source: Nvidia Developer Blog
Prachi Mishra

Developers Build Fast and Reliable Robot Simulations with NVIDIA Omniverse Libraries

2025-08-11 15:00
NVIDIA recently unveiled updates to the Omniverse libraries and Cosmos world foundation models at SIGGRAPH. 🌐 These enhancements, driven by OpenUSD, provide developers with new tools and models to create accurate virtual environments. They focus on building AI agents and simulations that can interact effectively with the real world. 🤖 This advancement aims to improve the reliability and speed of robotic simulations. #NVIDIA #Omniverse #Robotics #AI #SIGGRAPH
Source: Nvidia Developer Blog
Pomi Lee

How to Instantly Render Real-World Scenes in Interactive Simulation

2025-08-11 15:00
Transforming real-world environments into interactive simulations is now faster than ever. With NVIDIA Omniverse NuRec and 3DGUT, users can reconstruct photorealistic 3D scenes from basic sensor data. This process takes mere moments instead of days or weeks. 🌍✨ These scenes can be deployed in platforms like NVIDIA Isaac Sim or CARLA Simulator, enhancing simulation experiences. #NVIDIA #SimulationTechnology #3DModeling #Omniverse #InteractiveSimulations
Source: Nvidia Developer Blog
Katie Washabaugh

Maximize Robotics Performance by Post-Training NVIDIA Cosmos Reason

2025-08-11 15:00
Introducing NVIDIA Cosmos Reason, unveiled at GTC 2025! 🤖 This innovative reasoning vision language model (VLM) is designed for physical AI and robotics. It allows robots and vision AI agents to utilize prior knowledge, physics, and common sense to interpret and interact with the real world. By processing video and text prompts, Cosmos Reason transforms visual information into actionable insights. #NVIDIA #Robotics #AI #Innovation #GTC2025
Source: Nvidia Developer Blog
Tsung-Yi Lin

R²D²: Boost Robot Training with World Foundation Models and Workflows from NVIDIA Research

2025-08-08 18:33
🚀 The latest edition of NVIDIA's R²D² highlights the role of World Foundation Models (WFMs) in enhancing robot training. WFMs address the growing need for labeled datasets by simulating real-world dynamics. Key components include Cosmos Predict, Transfer, and Reason, each designed for specific applications in robotics and autonomous vehicles. Cosmos Predict generates future world states through various input types. Cosmos Transfer facilitates photorealistic style transfers, while Cosmos...
Source: Nvidia Developer Blog
Asawaree Bhide

Efficient Transforms in cuDF Using JIT Compilation

2025-08-07 21:06
Unlock efficient data processing with RAPIDS cuDF! 🚀 cuDF offers a wide range of ETL algorithms optimized for GPUs, allowing for seamless integration with pandas. Users can leverage accelerated algorithms without changing their existing code. For advanced developers, the cuDF C++ submodule enhances functionality through non-owning views and kernel fusion, boosting performance and reducing unnecessary GPU memory transfers. Learn how JIT compilation improves throughput and resource utilization...
Source: Nvidia Developer Blog
Basit Ayantunde

Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0

2025-08-07 18:25
🚀 Exciting advancements in machine learning with XGBoost 3.0! This version leverages the NVIDIA Grace Hopper Superchip to process datasets up to 1 TB, significantly speeding up training times—up to 8x faster than traditional CPUs. Key enhancements include a new external-memory engine, simplifying scalability and reducing reliance on complex GPU clusters. Major banks like RBC are already benefiting, reporting 16x speedups and 94% reductions in training costs. #XGBoost #MachineLearning #NVIDIA...
Source: Nvidia Developer Blog
Dante Gama Dessavre

How Hackers Exploit AI’s Problem-Solving Instincts

2025-08-07 16:00
🚨 As AI models become more advanced, they face new vulnerabilities. Researchers highlight how hackers exploit these systems by manipulating their problem-solving instincts. 🔍 The article discusses the evolution of attack techniques from text-based prompt injections to sophisticated multimodal reasoning attacks. These new methods target how AI merges inputs like text, images, and audio. 🔒 Securing AI requires a shift in focus from just input/output layers to the reasoning architecture itself....
Source: Nvidia Developer Blog
Daniel Teixeira

What’s New and Important in CUDA Toolkit 13.0

2025-08-06 16:00
🚀 Exciting updates in CUDA Toolkit 13.0! This major release enhances computing on NVIDIA CPUs and GPUs, introducing new features like tile-based programming and improved support for Arm platforms. Key updates include: - Enhanced NVIDIA Nsight Developer Tools - Math libraries updates for linear algebra and FFT - Improved NVCC Compiler with better compression - Accelerated Python cuda.core release CUDA 13.0 continues to support Blackwell GPUs and introduces a new programming model to boost...
Source: Nvidia Developer Blog
Jonathan Bentz

NVIDIA vGPU 19.0 Enables Graphics and AI Virtualization on NVIDIA Blackwell GPUs

2025-08-05 18:39
NVIDIA has released vGPU 19.0, enhancing virtualization for graphics and AI workloads. 🌐 The update leverages the NVIDIA RTX PRO 6000 Blackwell GPUs, which support advanced features like Multi-Instance GPU (MIG) for improved scalability and user density in data centers. With 96 GB of GDDR7 memory, these GPUs excel in demanding enterprise tasks, from AI inference to scientific computing. This release aims to significantly boost performance for virtualized workloads. #NVIDIA #Virtualization #AI...
Source: Nvidia Developer Blog
Phoebe Lee

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

2025-08-05 17:10
🚀 NVIDIA and OpenAI are advancing AI technologies with the launch of the gpt-oss-20b and gpt-oss-120b models. These models, designed for high-performance inference, can achieve 1.5 million tokens per second on the NVIDIA GB200 NVL72 system. 🧠 The gpt-oss models utilize a mixture of experts architecture and are optimized for NVIDIA's Blackwell system. They support advanced text reasoning capabilities and are trained on NVIDIA H100 Tensor Core GPUs. 🔧 Developers can access optimized kernels and...
Source: Nvidia Developer Blog
Anu Srivastava

CUDA Pro Tip: Increase Performance with Vectorized Memory Access

2025-08-04 21:05
Boost your CUDA performance by addressing bandwidth limitations! 🌐 Bandwidth-bound kernels are becoming more common due to the increasing ratio of flops to bandwidth in new hardware. To enhance bandwidth utilization, consider using vector loads and stores in your CUDA C++ code. Check out the provided memory copy kernel example, which uses grid-stride loops to improve efficiency. 📊 #CUDA #PerformanceOptimization #ProgrammingTips #TechInsights #NVIDIA
Source: Nvidia Developer Blog
Justin Luitjens

Navigating GPU Architecture Support: A Guide for NVIDIA CUDA Developers

2025-08-04 20:01
🚀 Are you developing with NVIDIA CUDA? You may have seen warnings about offline compilation for architectures prior to '_75' being phased out. This is a heads-up for developers to update their practices. The NVIDIA software stack consists of two main components: the CUDA Toolkit for building applications and the NVIDIA Driver for running them. The driver interfaces directly with GPU hardware and comes in three branches: New Feature Branch, Production Branch, and Long-Term Support Branch. Each...
Source: Nvidia Developer Blog
Jonathan Bentz

NVIDIA CUDA-Q 0.12 Expands Toolset for Developing Hardware-Performant Quantum Applications

2025-08-04 19:00
🚀 NVIDIA CUDA-Q 0.12 has been released, bringing new simulation tools for quantum application development. The update allows researchers to access detailed statistics on individual simulation runs, aiding in areas like noise correlation and circuit benchmarking. New features also enhance the CUDA-Q dynamics backend, improving support for multidiagonal sparse matrices and generic super-operators. This open-source project includes community contributions and Python 3.13 support. For more...
Source: Nvidia Developer Blog
Pradnya Khalate

How to Enhance RAG Pipelines with Reasoning Using NVIDIA Llama Nemotron Models

2025-08-04 17:00
Unlocking the potential of retrieval-augmented generation (RAG) systems involves addressing user queries that are vague or carry implicit intent. 🤔 The article discusses how NVIDIA's Nemotron LLMs enhance RAG pipelines through advanced query rewriting techniques. This process optimizes user prompts for better information retrieval, improving the relevance of results. 📈 Techniques like Q2E, Q2D, and chain-of-thought query rewriting help bridge gaps in understanding, leading to more accurate...
Source: Nvidia Developer Blog
Nicole Luo

7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows

2025-08-01 22:45
🚀 Speed up your Python data science workflows with easy drop-in replacements! Many libraries like pandas and scikit-learn can now leverage GPU acceleration with minimal code changes. Using tools like NVIDIA cuDF, you can enhance performance on large datasets without rewriting your scripts. Explore seven options to optimize your data processing today! #DataScience #Python #GPUAcceleration #TechTips #Programming
Source: Nvidia Developer Blog
Jamil Semaan

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

2025-08-01 21:27
🚀 Quantization is a key method for developers looking to enhance AI model performance with minimal overhead. It allows for significant improvements in latency, throughput, and memory efficiency by reducing model precision without retraining. Models typically use FP16 or BF16, while advancing to FP4 can yield even better efficiency. NVIDIA's TensorRT Model Optimizer offers a flexible framework for post-training quantization, supporting various formats and integrating calibration techniques for...
Source: Nvidia Developer Blog
Eduardo Alvarez

Just Released: NVIDIA HPC SDK v25.7

2025-07-31 18:09
🚀 Just announced: NVIDIA HPC SDK v25.7 is now available! This update includes support for CUDA 12.9U1, along with updated library components, bug fixes, and performance improvements. For installation, users can download the SDK for Linux x86_64 and follow the provided instructions for setup. Check it out to enhance your high-performance computing projects! #NVIDIA #HPCSDK #CUDA #TechUpdate #HighPerformanceComputing
Source: Nvidia Developer Blog
Shara Tibken

Just Released: NVIDIA cuPQC v0.4

2025-07-31 18:07
🚀 Just in: NVIDIA has launched cuPQC v0.4! This update brings Poseidon2 to cuHash and introduces a Merkle Tree API compatible with all cuHash hash functions. For those interested, you can download the SDK for GPU-accelerated Post-Quantum Cryptography. Check the documentation for system requirements and installation instructions. 🔗 Download cuPQC: [x86_64](https://developer.download.nvidia.com/compute/cupqc/redist/cupqc/cupqc-sdk-0.4.0-x86_64.tar.gz) |...
Source: Nvidia Developer Blog
Yarkin Doroz

Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails

2025-07-31 16:58
Prompt injection remains a significant threat to AI systems, particularly with the rise of multimodal and agentic AI. 🛡️ NVIDIA's AI Red Team simulates real-world attacks to identify vulnerabilities in these advanced systems, emphasizing the need for cross-functional solutions. Their recent research introduces a new category of multimodal prompt injection using symbolic visual inputs, like emoji sequences. 🔍 This shift highlights the importance of adapting security strategies from input...
Source: Nvidia Developer Blog
Daniel Teixeira