Articles from Source: Nvidia-Developer-Blog

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

2026-01-28 16:28
🚀 Introducing Dynamic Context Parallelism (Dynamic-CP) in NVIDIA Megatron Core! This innovative scheduling method enhances LLM post-training and DiT pre-training by adapting CP size per microbatch. It efficiently addresses the challenge of variable-length sequences, achieving up to 1.48x speedup on real-world datasets. 📈 Large-scale model training often struggles with sequence-length variability, impacting resource use. Dynamic-CP optimizes performance by managing these variations...
Source: Nvidia Developer Blog
Kunlun Li

Updating Classifier Evasion for Vision Language Models

2026-01-28 16:19
Advancements in AI are enhancing vision language models (VLMs), allowing them to process both text and images simultaneously. 🖼️📚 These models enable applications like interpreting graphs and processing camera feeds, broadening functionality in various systems. However, with this new capability comes potential security risks from untrusted image sources. 🔒 The article explores historical attack methods and how they apply to modern VLMs, aiding developers in understanding threats and...
Source: Nvidia Developer Blog
Joseph Lucas

Accelerating Diffusion Models with an Open, Plug-and-Play Offering

2026-01-27 19:00
🚀 Advances in large-scale diffusion models are transforming generative AI, impacting image synthesis, audio generation, and more. However, sampling inefficiency poses significant challenges, especially in video generation, where the process can take minutes to hours. ⏱️ NVIDIA has introduced FastGen, an open-source library that accelerates diffusion models, achieving 10x to 100x speedups without sacrificing quality. This tool aims to streamline real-time video generation and interactive...
Source: Nvidia Developer Blog
Weili Nie

Adaptive Inference in NVIDIA TensorRT for RTX Enables Automatic Optimization

2026-01-26 21:00
NVIDIA's TensorRT for RTX is changing the game for AI application deployment. 🤖✨ This lightweight inference library, under 200 MB, offers a Just-In-Time (JIT) optimizer that compiles engines in under 30 seconds. It allows real-time optimization without manual tuning or multiple build targets. With adaptive inference, engines automatically adjust to specific hardware, improving performance progressively as applications run. Key features include Dynamic Shape specialization, built-in CUDA...
Source: Nvidia Developer Blog
George Stefanakis

How to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2

2026-01-26 14:00
Unlocking local climate details is crucial for accurate risk assessment. 🌍 NVIDIA Earth-2 offers tools to downscale coarse climate projections into high-resolution data. This process reveals local extremes like hurricanes, which are often overlooked. The platform uses the CorrDiff model to enhance climate data transformations efficiently. 📊 Learn how leading organizations are applying this technology for better climate insights. #ClimateScience #NVIDIA #RiskAssessment #AI #ClimateChange
Source: Nvidia Developer Blog
Georg Ertl

Overcoming Compute and Memory Bottlenecks with FlashAttention-4 on NVIDIA Blackwell

2026-01-22 22:22
🚀 The transformer architecture is transforming generative AI, enabling large language models like GPT and Llama. Its self-attention mechanism allows for parallel processing, but faces challenges with memory and computation due to quadratic complexity. 🔍 FlashAttention offers a solution, improving efficiency by reducing memory access and combining computational steps into an optimized GPU kernel. 📈 This innovation lowers memory complexity and enhances training speed, allowing models to manage...
Source: Nvidia Developer Blog
Johnny Núñez

Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs

2026-01-22 19:21
🚀 In 2025, NVIDIA teamed up with Black Forest Labs to enhance the FLUX.1 text-to-image model series, achieving FP4 image generation on Blackwell GPUs. This collaboration led to FLUX.2, which supports multi-image references and offers enterprise-level quality. Significant optimizations have reduced memory needs by over 40%, allowing for local deployment via ComfyUI. NVIDIA and BFL are now introducing 4-bit acceleration for FLUX.2 on advanced data center GPUs, improving latency and efficiency....
Source: Nvidia Developer Blog
Sandro Cavallari

Streamlining CUB with a Single-Call API

2026-01-21 21:28
🚀 CUB is a C++ library essential for high-performance GPU algorithms, known for its two-phase API that separates memory estimation from allocation. This can lead to repetitive code. With the recent shift to the single-call API in CUDA 13.1, developers can now simplify memory management without losing performance. CUB allows for efficient execution of algorithms like scan and sort directly in custom kernels, making it a powerful tool for harnessing NVIDIA GPUs. Learn more about CUB in NVIDIA's...
Source: Nvidia Developer Blog
Giannis Gonidelis

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning

2026-01-15 16:00
🚀 Discover how to train an AI agent for command-line tasks using synthetic data and reinforcement learning! In this article, a custom Bash agent evolves to operate the LangGraph Platform CLI. This involves teaching the agent to perform advanced tasks like starting servers and building containers through a controlled command interface. 🤖 The process combines synthetic data generation and reinforcement learning, ensuring efficient and safe training. The agent can propose commands, seek human...
Source: Nvidia Developer Blog
Chris Alexiuk

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

2026-01-14 20:41
Unlock the potential of NVIDIA CUDA Tile programming! 🚀 This post dives into high-performance matrix multiplication, guiding developers through the implementation process using cuTile. Key topics include the flow of tile loading, computation, and storage. Learn to shift from thread-level to block-level programming and explore optimization strategies for better performance. Ensure your setup meets the CUDA and Python requirements for a smooth experience. #NVIDIA #CUDA #MatrixMultiplication...
Source: Nvidia Developer Blog
Jinman Xie

NVIDIA DLSS 4.5 Delivers Super Resolution Upgrades and New Dynamic Multi Frame Generation

2026-01-14 14:00
🚀 NVIDIA has unveiled DLSS 4.5, enhancing real-time graphics for gamers. This technology now supports over 400 titles, offering improved lighting and motion clarity. The introduction of a second-generation transformer model boosts image quality significantly. Developers can start leveraging these advancements to elevate gaming experiences. 🎮✨ #NVIDIA #DLSS #GamingTechnology #GameDev #RTX
Source: Nvidia Developer Blog
Ike Nnoli

Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics

2026-01-13 20:32
NVIDIA cuOpt is a GPU-accelerated optimization engine aimed at solving complex decision-making problems quickly. It utilizes mixed integer programming (MIP) to handle various challenges, including production planning and supply chain management. By employing accelerated primal heuristics, cuOpt significantly reduces solve times, making it effective for time-sensitive situations. Recent results from the MIPLIB benchmark show improved solution quality compared to traditional CPU solvers....
Source: Nvidia Developer Blog
Piotr Sielski

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

2026-01-09 16:58
🧠 Large Language Models (LLMs) are in the spotlight for their ability to handle extensive context, including conversation histories and books. However, they still struggle with continuity, often needing repeated context. 📚 The article discusses the gap between LLM memory and human memory. It introduces a new approach called test-time training with an end-to-end formulation (TTT-E2E) that allows LLMs to adapt by compressing context into their weights. #AI #LanguageModels #MachineLearning...
Source: Nvidia Developer Blog
Yu Sun

Build an AI Catalog System That Delivers Localized, Interactive Product Experiences

2026-01-09 14:00
Transform your e-commerce catalogs with AI! 🛒 Many online catalogs suffer from limited product data and generic images, which impacts discoverability and sales. This tutorial guides developers and product teams on creating an AI-driven enrichment system. Utilizing NVIDIA's advanced models, you can generate rich, localized product listings from a single image. The process includes automated titles, descriptions, categories, and even 3D assets tailored for different markets. Designed for those...
Source: Nvidia Developer Blog
Antonio Martinez

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

2026-01-09 14:00
🚀 Warehouses are evolving with automation and data, yet many still lack a cohesive system. NVIDIA's Multi-Agent Intelligent Warehouse (MAIW) Blueprint aims to address this gap by providing an AI command layer that integrates WMS, ERP, and IoT systems. This solution transforms disparate data into actionable insights, enabling proactive decision-making. By unifying fragmented operations, MAIW enhances efficiency, reduces downtime, and improves safety. #SupplyChain #WarehouseManagement #AI...
Source: Nvidia Developer Blog
Tarik Hammadou

Building Generalist Humanoid Capabilities with NVIDIA Isaac GR00T N1.6 Using a Sim-to-Real Workflow

2026-01-08 17:38
NVIDIA introduces the GR00T N1.6, advancing humanoid robot capabilities through a sim-to-real workflow. This model enhances cognition and loco-manipulation, utilizing whole-body reinforcement learning and advanced visual mapping techniques. 🤖✨ Key features include improved reasoning, adaptive motion, and enhanced performance across various robot types. GR00T N1.6 can effectively execute tasks by integrating visual cues and natural language instructions. Check out the demo from the Conference...
Source: Nvidia Developer Blog
Edith Llontop

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

2026-01-08 17:28
🚗🔧 Large language models (LLMs) and vision language models (VLMs) are evolving for automotive and robotics use. Developers are seeking ways to implement AI agents and multimodal systems directly in vehicles and robots, prioritizing low latency and reliability. NVIDIA has introduced TensorRT Edge-LLM, an open-source C++ framework designed for high-performance edge inference. This framework is tailored for real-time applications on NVIDIA DRIVE AGX Thor and Jetson Thor platforms. With a...
Source: Nvidia Developer Blog
Lin Chai

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

2026-01-08 02:43
🚀 AI models are advancing, leading to increased interactions across various sectors. This growth demands efficient token generation at low costs. NVIDIA is responding with its Blackwell architecture, enhancing token throughput per watt through co-design of hardware and software. This boosts performance for existing GPU infrastructures, ensuring prolonged productivity. Recent updates in the NVIDIA inference software stack significantly improve reasoning performance for large models like...
Source: Nvidia Developer Blog
Ashraf Eassa

Build and Orchestrate End-to-End SDG Workflows with NVIDIA Isaac Sim and NVIDIA OSMO

2026-01-07 18:00
Unlock the potential of robotics with synthetic data pipelines! 🤖 As robots tackle complex mobility tasks, developers require accurate simulations. NVIDIA Isaac Sim provides a solution by generating high-quality synthetic data, reducing the time and cost of real-world data collection. Key points include: - Creating simulated environments with NVIDIA Omniverse NuRec. - Utilizing SimReady assets for streamlined simulations. - Generating and augmenting synthetic data using MobilityGen and NVIDIA...
Source: Nvidia Developer Blog
Asawaree Bhide

Redefining Secure AI Infrastructure with NVIDIA BlueField Astra for NVIDIA Vera Rubin NVL72

2026-01-07 17:00
🚀 Large-scale AI innovation is pushing the need for advanced computing infrastructure. Service providers are focusing on security and tenant isolation to effectively manage AI workloads. 🔍 The introduction of NVIDIA BlueField Astra on BlueField-4 redefines how AI infrastructure is managed. It enables better control and scalability for service providers. 🌐 Additionally, the NVIDIA Ethernet SuperNIC is designed to meet the demanding requirements of AI workloads, ensuring high performance and...
Source: Nvidia Developer Blog
Erez Tweg

Introducing NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform for the Next Frontier of AI

2026-01-06 17:30
Introducing the NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform, designed to address the challenges faced by AI-native organizations. As AI workflows evolve, the demand for scalable context windows and efficient memory systems has increased. The Rubin platform organizes AI infrastructure into compute pods, enhancing performance and power efficiency. The NVIDIA Inference Context Memory Storage (ICMS) provides an optimized storage solution that supports gigascale inference,...
Source: Nvidia Developer Blog
Moshe Anschel

Scaling Power-Efficient AI Factories with NVIDIA Spectrum-X Ethernet Photonics

2026-01-06 16:59
NVIDIA is introducing optimized Ethernet networking with co-packaged optics for AI factories. 🌐 This innovation, through the Spectrum-X Ethernet Photonics, supports efficient scaling on the NVIDIA Rubin platform for AI infrastructure. It ensures reliable data transmission, improving performance and model dispatch efficiency across diverse workloads. Explore how these advancements enable seamless operations within AI factories. ⚙️💡 #NVIDIA #AIFactories #Ethernet #TechInnovation #AI
Source: Nvidia Developer Blog
Ashkan Seyedi

Open-Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs

2026-01-06 05:30
AI development on PCs is rapidly growing, fueled by advancements in small language models and diffusion models like FLUX.2 and GPT-OSS-20B. 📈 NVIDIA is set to announce upgrades for AI PC developers at CES 2026, enhancing tools like llama.cpp and ComfyUI. These updates promise improved performance and efficiency on NVIDIA GPUs. 💻✨ Key highlights include optimized inference and significant memory savings with new quantized formats. 🛠️ #AIDevelopment #NVIDIA #OpenSourceAI #TechUpdates #CES2026
Source: Nvidia Developer Blog
Annamalai Chockalingam

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

2026-01-05 22:50
NVIDIA is enhancing the performance of its DGX Spark systems with ongoing software optimizations and collaborations. The latest updates boost capabilities in inference, training, and creative workflows. Key features include 128GB of unified memory, enabling larger model processing locally. 🌐 New support for the NVFP4 data format provides a 2.6x performance increase for certain models while reducing memory usage. This allows for multitasking without sacrificing speed or accuracy. ⚡...
Source: Nvidia Developer Blog
Allen Bourgoyne

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

2026-01-05 22:20
🚀 AI is transforming industries with the NVIDIA Rubin platform, designed for always-on AI factories. These factories streamline data processing, enabling complex workflows and real-time inference while addressing power, security, and cost constraints. The Rubin platform features an innovative six-chip architecture that integrates GPUs, CPUs, and more for efficient intelligence production. Learn about its impact on AI scalability and the software tools that enhance developer experience....
Source: Nvidia Developer Blog
Kyle Aubrey

Simplify Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena

2026-01-05 22:14
🚀 Exciting news in robotic policy evaluation! NVIDIA has introduced Isaac Lab-Arena, an open-source framework designed for scalable and efficient robotic policy testing in simulation. This tool simplifies the process of task curation and benchmarking, enabling developers to prototype complex evaluations without extensive setup. Key features include modular task architectures, automated task diversification, and support for large-scale evaluations across diverse environments. The pre-alpha...
Source: Nvidia Developer Blog
Sangeeta Subramanian

Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1

2026-01-05 22:10
🚀 NVIDIA has launched the Jetson T4000, designed to enhance AI performance in robotics and edge applications. With up to 1200 TFLOPs of AI compute and 64 GB of memory, it balances efficiency and scalability. The T4000 supports real-time 4K video processing, making it suitable for advanced intelligent systems. Developers can create common carrier boards for both T4000 and T5000 modules, optimizing design efforts. #NVIDIA #JetsonT4000 #AI #Robotics #EdgeComputing
Source: Nvidia Developer Blog
Shashank Maheshwari

How to Build a Voice Agent with RAG and Safety Guardrails

2026-01-05 22:06
🚀 Discover how to build a voice-powered RAG agent with safety guardrails in a new tutorial! This guide covers the integration of retrieval, speech, safety, and reasoning components to create a cohesive system. By using NVIDIA Nemotron models, you'll learn to develop an agent that listens, reasons, and responds safely in audio format. Start developing locally and easily scale to NVIDIA's managed environments. #VoiceTech #NVIDIA #RAG #AI #SafetyFirst
Source: Nvidia Developer Blog
Chris Alexiuk

Building Autonomous Vehicles That Reason with NVIDIA Alpamayo

2026-01-05 21:49
🚗🔍 Autonomous vehicle research is evolving with NVIDIA's introduction of Alpamayo, a new platform for reasoning-based vision–language–action (VLA) models. These models enhance AV decision-making by mimicking human reasoning processes, allowing for step-by-step problem-solving. Traditional evaluation methods are being challenged, requiring new tools for assessment. Alpamayo includes: 1️⃣ Alpamayo 1 model for trajectory predictions. 2️⃣ The Physical AI dataset for extensive training. 3️⃣...
Source: Nvidia Developer Blog
Marco Pavone

Accelerating AI-Powered Chemistry and Materials Science Simulations with NVIDIA ALCHEMI Toolkit-Ops

2025-12-19 17:00
🚀 Exciting advancements in computational chemistry are here! NVIDIA has introduced the ALCHEMI Toolkit-Ops to enhance atomistic simulations using machine learning interatomic potentials (MLIPs). This toolkit addresses the challenges posed by traditional CPU-centric simulation tools. ALCHEMI offers GPU-accelerated operations, enabling faster and more efficient simulations in chemistry and materials science. It includes a modular API for seamless integration with existing simulation packages....
Source: Nvidia Developer Blog
Justin S. Smith

Real-Time Decoding, Algorithmic GPU Decoders, and AI Inference Enhancements in NVIDIA CUDA-Q QEC

2025-12-17 21:32
🚀 Real-time decoding is essential for fault-tolerant quantum computers. NVIDIA's CUDA-Q QEC version 0.5.0 enhances this with low-latency decoders working alongside quantum processing units (QPU). Key improvements include online real-time decoding, GPU-accelerated algorithmic decoders, and better AI inference support. Users can efficiently conduct quantum error correction through a streamlined four-stage workflow. Explore how these advancements can accelerate your research! #QuantumComputing...
Source: Nvidia Developer Blog
Tom Lubowe

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

2025-12-17 19:00
🚀 Data drives modern business, but older CPU-based Apache Spark pipelines can be slow and costly. Project Aether offers a solution by automating the migration of these workloads to GPU-accelerated Amazon EMR. This transition enhances performance using the RAPIDS Accelerator, leading to reduced cloud costs and improved efficiency. Learn more about optimizing your data processes! #DataAnalytics #ApacheSpark #GPUs #AmazonEMR #ProjectAether
Source: Nvidia Developer Blog
Navin Kumar

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

2025-12-17 18:30
🚀 Solving large-scale problems in EDA, CFD, and optimization is becoming essential as designs grow complex. The NVIDIA CUDA Direct Sparse Solver (cuDSS) allows users to run sparse solvers efficiently with minimal code changes. It supports hybrid memory mode, enabling larger problem-solving across multiple GPUs or nodes. The blog covers strategies for using cuDSS effectively, particularly with recent GPU advancements. #NVIDIA #cuDSS #DataScience #Engineering #Optimization
Source: Nvidia Developer Blog
Jeff Layton

Simulate Robotic Environments Faster with NVIDIA Isaac Sim and World Labs Marble

2025-12-17 17:00
🚀 Building 3D environments for robotics simulation is now easier and faster! NVIDIA Isaac Sim and World Labs' Marble enable users to create photorealistic scenes from text prompts. This method drastically reduces setup time compared to traditional modeling. A recent case study highlights how researchers are leveraging these generative models for robot training and testing. Key steps include scene export, conversion to USD format, and simulation in Isaac Sim. Explore the future of robotics...
Source: Nvidia Developer Blog
Wonsik Han

Simulate an Accurate Radio Environment Using NVIDIA Aerial Omniverse Digital Twin

2025-12-17 16:00
Unlock the potential of 5G and 6G with NVIDIA's Aerial Omniverse Digital Twin! 📡 This tutorial guides researchers and engineers in enhancing their simulations by integrating high-fidelity channel models into existing frameworks. Prerequisites include an NVIDIA RTX GPU, access to AODT Release 1.4, and basic Python knowledge. Explore how AODT fits into various programming environments like C++ and MATLAB! #5G #6G #NVIDIA #DigitalTwin #WirelessTechnology
Source: Nvidia Developer Blog
Tommaso Balercia

Simulate an Accurate Radio Environment Using NVIDIA Aerial Omniverse Digital Twin

2025-12-17 16:00
Unlock the potential of 5G and 6G with NVIDIA's Aerial Omniverse Digital Twin (AODT). 🌐 This tutorial guides researchers and engineers on integrating high-fidelity radio channel modeling into existing simulation frameworks. AODT bridges the gap between different simulators, enhancing accuracy in modeling. Prerequisites include an NVIDIA RTX GPU, access to the AODT Release 1.4 container, and basic Python knowledge. Explore how AODT can elevate your simulations. 🚀📊 #5G #6G #NVIDIA #DigitalTwin...
Source: Nvidia Developer Blog
Tommaso Balercia

Using AI Physics for Technology Computer-Aided Design Simulations

2025-12-17 16:00
🚀 Technology Computer-Aided Design (TCAD) simulations are vital for semiconductor manufacturing, allowing engineers to test designs digitally before physical production. However, traditional simulations can take weeks, delaying processes. AI-augmented TCAD, with tools like NVIDIA PhysicsNeMo and Apollo, helps accelerate these simulations. Engineers at SK hynix utilize these AI frameworks to enhance device designs, significantly reducing simulation times from hours to milliseconds. Learn how...
Source: Nvidia Developer Blog
Ram Cherukuri

Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

2025-12-17 02:00
In semiconductor manufacturing, detecting and classifying defects is crucial for success. Traditional CNN-based methods face limitations, including high data requirements and frequent retraining needs. Generative AI offers a solution. By utilizing NVIDIA's vision language models (VLMs) and vision foundation models (VFMs), manufacturers can modernize defect classification and improve accuracy across various processes. These advancements can enhance the efficiency of defect detection, reducing...
Source: Nvidia Developer Blog
Tim Lin

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

2025-12-16 21:00
🚀 Machine learning engineers face challenges with long-context inference in LLMs due to rising computation costs. The article introduces Skip Softmax, a technique that accelerates inference without retraining. It offers up to 1.4x faster time-to-first-token and time-per-output-token. Learn how to implement Skip Softmax in NVIDIA TensorRT-LLM for improved performance. #MachineLearning #NVIDIA #AI #TensorRT #Inference
Source: Nvidia Developer Blog
Laikh Tewari

Advanced Large-Scale Quantum Simulation Techniques in cuQuantum SDK v25.11

2025-12-16 18:00
Unlocking the potential of quantum computing is challenging as QPUs advance. 🔍 The latest cuQuantum SDK v25.11 introduces tools for Pauli propagation and stabilizer simulations, enhancing the simulation of large-scale quantum circuits. This update allows for efficient estimation of observables, crucial for applications like VQE. Explore how GPU-accelerated methods can support your quantum research! 💻✨ #QuantumComputing #cuQuantum #AI #Simulation #NVIDIA
Source: Nvidia Developer Blog
Tom Lubowe