Articles by Category: Technical_deep_dives

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel

2026-02-02 18:43
Optimizing communication in mixture-of-experts (MoE) training is crucial for large language models (LLMs). The article introduces Hybrid-EP, a solution for improving Expert Parallel communication, particularly in NVIDIA's Megatron frameworks. This addresses challenges like communication bottlenecks and load imbalances in models like DeepSeek-V3. The new approach enhances training efficiency by integrating advanced parallelism strategies and optimizing resource usage. 🔍💻✨ #MachineLearning...
Fan Yu

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

2026-02-02 17:01
🚀 The article "Beyond Two Towers" discusses the evolution of ad ranking models from a traditional Two-Tower architecture to a more advanced GPU-based system. 📈 The shift aims to enhance recommendation quality by allowing deeper user-item interactions, overcoming limitations of the old model. 💡 Key optimizations include bundling features with models, moving business logic into the model, and rethinking data flow to minimize latency. Stay tuned for more insights on this innovative approach!...
Pinterest Engineering

S3 is the new network: Rethinking data architecture for the cloud era

2026-02-02 12:00
In the article "S3 is the new network," the author examines the evolution of data architecture in the cloud era. Traditionally, databases relied on keeping storage close to compute to reduce delays. However, this approach can hinder scalability and increase costs. With the rise of cloud object storage services like AWS S3, a new paradigm emerges. 🌐 Cloud object storage offers virtually unlimited capacity and global accessibility, allowing data to be accessed from anywhere. While S3 may not be...
Max Liu

S3 is the new network: Rethinking data architecture for the cloud era

2026-02-02 12:00
Rethinking data architecture is crucial in the cloud era. 🌐 For years, distributed databases relied on keeping storage close to compute to reduce delays. However, this approach can slow down scaling and increase costs. With the rise of cloud object storage services like AWS S3, designers can now focus on creating efficient databases without worrying about data location or network reliability. 📊 S3 offers unlimited capacity and high durability, ensuring data is safe and readily accessible. Its...
Max Liu

AI-generated product review summaries with OpenShift AI

2026-02-02 07:01
Discover how OpenShift AI enhances online shopping experiences with AI-generated product review summaries! 🛍️ This final part of our series showcases the use of LLMs to summarize user reviews efficiently. Users can generate real-time summaries directly from product pages, streamlining the decision-making process. 📊 The article also explains the user registration process, allowing for personalized product recommendations. Explore the integration of machine learning and discover practical...
Hadar Cohen, Ori Fridman, Itay Katav, Manna Kong, Ganesh Murthy, Peter Samouelian, Matan Talvi

How Agentforce Enhanced Chat Built an Agent-first Chat Experience While Ensuring Easy Migration for 3,000+ Customers

2026-02-02 01:49
🚀 Exciting advancements in customer engagement! The Salesforce Engineering team, led by Andy Shah, has developed Agentforce Enhanced Chat, enhancing web experiences for over 3,000 customers. This solution allows for seamless integration of new capabilities while ensuring stable migration paths for existing users. Key highlights include a focus on backward compatibility and a robust architectural design that supports customization and high performance. Discover how thoughtful engineering can...
Scott Nyberg

How Yelp Built a Back-Testing Engine for Safer, Smarter Ad Budget Allocation

2026-02-02 00:00
🚀 At Yelp, we’ve developed a Back-Testing Engine to enhance our Ad Budget Allocation process. This tool simulates potential changes to our ad algorithms using historical campaign data, allowing us to preview impacts without affecting real budgets. Our system allocates daily budgets for numerous campaigns, adapting based on previous outcomes. This innovation helps us make safer, smarter decisions. #Yelp #AdTech #Innovation #MarketingStrategy #Budgeting
Samuele Mazzanti, Applied Scientist

Beating context rot in Claude Code with GSD

2026-01-31 17:00
🚀 Exploring the challenges of using LLMs like Claude for project creation reveals issues like "context rot." This term describes how earlier tokens receive more attention, causing difficulties during lengthy tasks. 🛠️ The article introduces GSD, a tool designed to combat this by adding a context engineering layer. It organizes tasks effectively, allowing for improved project outcomes. 📊 The author plans to test GSD by developing a front end for viewing JSON objects, aiming to ensure a...
David Eastman

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

2026-01-30 20:01
NVIDIA is advancing GPU programming with the integration of CUDA Tile as a backend for OpenAI Triton. This development targets portability for NVIDIA Tensor Cores, enhancing GPU performance. CUDA Tile allows developers to express computations at a higher abstraction level by working with data blocks (tiles). This reduces programming complexity and enables better compiler optimizations. The Triton-to-TileIR backend connects Triton with CUDA Tile IR, allowing developers to compile GPU kernels...
Jie Xin

Technical Deep Dive: How we Created a Security-hardened 1-Click Deploy OpenClaw

2026-01-30 18:36
🚀 Exciting news from DigitalOcean! We've launched the 1-Click Deploy OpenClaw, an open source AI assistant that's gaining traction. This service allows users to deploy OpenClaw easily and securely on our Droplet® servers. Key features include safe communication through TLS, isolated agent code, and scalability options. We're focused on providing a stable and secure user experience. Explore the 1-Click Deploy OpenClaw today in our Marketplace! #OpenClaw #DigitalOcean #AIAssistant #TechNews
Freddie Rice

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

2026-01-30 18:00
🔍 Exploring the world of sparse tensors! Sparse tensors, which are essential in fields like scientific computing and deep learning, help optimize storage and computation. However, managing them can be challenging due to existing limitations. The Universal Sparse Tensor (UST) offers a solution by separating tensor sparsity from its memory representation. Developers can use a domain-specific language (DSL) to define and optimize sparse storage formats to fit their applications. This innovative...
Aart J.C. Bik

Hidden Technical Debt of GenAI Systems

2026-01-30 03:00
Generative AI systems carry hidden technical debt that can impact their effectiveness. Key issues include tool sprawl, which complicates management, and opaque pipelines that make it hard to track processes. Additionally, subjective evaluations can lead to inconsistent results. Understanding these challenges is crucial for improving generative AI practices. 🔍💻⚙️ #GenerativeAI #TechDebt #AIManagement #Innovation #AIChallenges

Technical Deep Dive: How we Created a Security-hardened 1-Click Deploy Moltbot

2026-01-29 21:32
🚀 DigitalOcean has launched a 1-Click Deploy Moltbot, an open-source AI assistant, designed for easy and secure use. With growing interest in Moltbot, we focused on enhancing security for users deploying this software. Key features include: - Consistent, stable deployments - Safe communication through TLS - Isolation of agent code for security The 1-Click Deploy Moltbot is now available in our Marketplace, making it simple to get started! 🔗 #DigitalOcean #Moltbot #OpenSource #CyberSecurity #AI
Freddie Rice

How Uber Scaled Data Replication to Move Petabytes Every Day

2026-01-29 14:30
Uber is focused on maintaining a reliable data lake that spans both on-premise and cloud environments. 🌐 This distributed setup faces challenges like limited bandwidth and the need for quick data access, especially for disaster recovery. To address these issues, Uber employs the Hive Sync service and Apache Hadoop® Distcp for data replication. As their Data Lake grows to over 350 PB, they've optimized Distcp to enhance performance, ensuring effective data replication and disaster recovery. 🔄...

Beyond the Chatbot: A Blueprint for Trustable AI

2026-01-29 00:00
At Thunderhill Raceway Park, a team of Google Developer Experts tested a new "Trustable AI Framework" during high-speed racing. 🏎️💨 They aimed to provide real-time guidance to drivers, minimizing the risk of AI errors. Using the Antigravity framework, they reduced a three-month development cycle to just two weeks. 🛠️ The project also highlighted a "Split-Brain" architecture to effectively manage reflexes and strategy. #AI #Racing #Innovation #TrustableAI #GoogleDevelopers

Ads Candidate Generation using Behavioral Sequence Modeling

2026-01-28 23:01
Pinterest is enhancing ad relevance through Behavioral Sequence Modeling. This approach uses historical user behavior to predict future interactions with advertisers. The model focuses on personalizing ad candidates by analyzing user actions like views and purchases. Key metrics such as Recall@K help evaluate performance, showing significant improvements in conversion and cost efficiency since launch. Exciting advancements are also underway to predict specific product interactions for a more...
Pinterest Engineering

Forge app implementation patterns

2026-01-28 19:43
🚀 In the latest post on Forge app implementation patterns, key coding strategies are shared for building efficient apps. 🔄 **Code Sharing**: A simple method is suggested for synchronizing code between different areas, using a shell script to manage shared utility code. ⚙️ **Configuration Distribution**: The article emphasizes the importance of loading configuration information at the start of each activity, ensuring efficiency in API calls with Jira. For developers looking to enhance their...
Dugald Morrow

Diagnosing instability in production-scale agent reinforcement learning

2026-01-28 18:07
🚀 Hugging Face has integrated the Post-Training Toolkit into TRL, enhancing diagnostics for production-scale reinforcement learning (RL) systems. This integration allows for better monitoring and control in long-running agent systems. It helps identify late-phase instability that can develop gradually, often going unnoticed until recovery options are limited. 📈 The article discusses a specific issue with tool-using agents trained using on-policy methods, revealing that traditional metrics can...
Aditya Challapally

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

2026-01-28 18:00
🚀 Engineering VP Josh Clemm recently shared insights on Dropbox's use of knowledge graphs, MCP, and DSPy during a session in Jason Liu’s RAG course on Maven. He discussed how Dropbox Dash integrates with third-party apps, allowing users to easily search and manage their content. For a deeper understanding, check out the full video on Maven! #DropboxDash #KnowledgeGraphs #TechInnovation #MavenCourse #AI
Josh Clemm

From pixels to characters: The engineering behind GitHub Copilot CLI’s animated ASCII banner

2026-01-28 17:00
GitHub recently shared insights on the engineering behind the animated ASCII banner for Copilot CLI. 🖥️ The team faced unique challenges due to varied terminal behaviors and accessibility standards. This complicated the project, requiring over 6,000 lines of TypeScript to ensure compatibility across different environments. The result? A playful animated mascot that showcases the complexity of engineering in the CLI space. 🚀 Explore Copilot CLI’s agentic workflows, which enhance productivity...
Aaron Winston

7-Eleven’s Journey to Unity Catalog Success

2026-01-28 16:30
7-Eleven, Inc. has more than 85,000 stores globally, catering to millions of customers. The article details their successful migration to Unity Catalog, highlighting the challenges faced and strategies implemented. The focus on reorienting complex systems played a key role in this transition. Learn more about their journey! 🌍📈 #7Eleven #UnityCatalog #BusinessSuccess #DataMigration #RetailInnovation

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

2026-01-28 16:28
🚀 Introducing Dynamic Context Parallelism (Dynamic-CP) in NVIDIA Megatron Core! This innovative scheduling method enhances LLM post-training and DiT pre-training by adapting CP size per microbatch. It efficiently addresses the challenge of variable-length sequences, achieving up to 1.48x speedup on real-world datasets. 📈 Large-scale model training often struggles with sequence-length variability, impacting resource use. Dynamic-CP optimizes performance by managing these variations...
Kunlun Li

Agoda’s secret to 50x scale: Getting the database basics right

2026-01-28 15:00
Agoda, part of Booking Holdings, is experiencing rapid growth, with server traffic increasing 50 times from January 2023 to February 2025. 📈 To manage this scale, the engineering team faced challenges in maintaining low latency on their ScyllaDB-backed feature store. They addressed unpredictable traffic patterns and potential database flooding. Worakarn Isaratham shared insights at the Monster Scale Summit on how they navigated these engineering hurdles. The feature store plays a crucial role...
Cynthia Dunlop

How we turned OpenShift installation into a smart chatbot-driven experience

2026-01-28 13:56
Red Hat has transformed OpenShift installation by integrating AI into a conversational agent, simplifying the deployment process. This initiative began as an internal experiment, evolving into a tool that understands user intent and automates cluster installations. Key challenges included maintaining persistent memory and ensuring security during the interaction. Users can now engage directly with the assistant for a smoother experience, making complex installations more accessible. 💻🤖🔧✨...
Rom Freiman, Eran Cohen

Towards a science of scaling agent systems: When and why agent systems work

2026-01-28 11:00
New research explores scaling principles for AI agent systems through a study of 180 configurations. Key findings show that while multi-agent coordination boosts performance in parallel tasks, it can hinder outcomes in sequential ones. A predictive model was also developed to identify optimal architectures for 87% of unseen tasks. This challenges the common belief that "more agents are better," highlighting the need for careful design in agent systems. #AI #AgentSystems #GenerativeAI...

Performance and load testing in Identity Management (IdM) systems using encrypted DNS (eDNS)

2026-01-28 03:01
🔍 The first part of a two-part series explores performance testing in Identity Management (IdM) systems using encrypted DNS (eDNS). The study examines how IdM functions under high load with over 1,000 DNS requests per second, comparing traditional DNS over UDP and TCP with DNS over TLS (DoT). Key findings detail metrics like queries per second (QPS), latency, and resource use, with results visualized using Prometheus and Grafana. The article emphasizes the impact of encryption on performance...
Josep Andreu Font, Ramon Gordillo Gutierrez

Accelerating Diffusion Models with an Open, Plug-and-Play Offering

2026-01-27 19:00
🚀 Advances in large-scale diffusion models are transforming generative AI, impacting image synthesis, audio generation, and more. However, sampling inefficiency poses significant challenges, especially in video generation, where the process can take minutes to hours. ⏱️ NVIDIA has introduced FastGen, an open-source library that accelerates diffusion models, achieving 10x to 100x speedups without sacrificing quality. This tool aims to streamline real-time video generation and interactive...
Weili Nie

Inside Salesforce Edge: Automating Global Rollback for 1.5 Trillion Requests in 10 Minutes

2026-01-27 14:22
🚀 In the latest article, Salesforce highlights the work of Sanjeev Chhabria and his team in automating global rollback processes. 🌐 Their efforts have cut rollback time from hours to just 10 minutes, allowing Salesforce Edge to manage 1.5 trillion requests monthly with improved reliability. 🔧 Key innovations include re-architecting Kubernetes deployments and automating traffic cutover to maintain high availability. #Salesforce #Engineering #Innovation #Automation #CloudComputing
Scott Nyberg

Building a serverless, post-quantum Matrix homeserver

2026-01-27 14:00
🚀 Exciting news in decentralized communication! A complete Matrix homeserver has been successfully ported to Cloudflare Workers. This innovation delivers encrypted messaging at the edge, featuring automatic post-quantum cryptography. Traditionally, running a Matrix homeserver involved complex system administration and high operational costs. Now, this serverless architecture eliminates those burdens, making deployment easier and more efficient. For developers, this means lower costs, low...
Nick Kuntz

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

2026-01-27 01:53
Unlocking agentic reinforcement learning (RL) enhances traditional LLM training by focusing on multi-step decision-making. This method enables models to optimize performance through direct interaction with their environment, rather than relying on static data. It supports continuous learning by adjusting to outcomes and refining actions over time. Key processes include collecting on-policy data, computing rewards, and updating policies for improved future interactions. #ReinforcementLearning...

How Perk cut recovery time to 31 minutes with centralized rollbacks in CircleCI

2026-01-26 20:40
🚀 Perk's DevOps team has successfully reduced incident recovery time to just 31 minutes using CircleCI's centralized rollback feature. By leveraging the Platform Team Toolkit, they've scaled these benefits across multiple services. This partnership is crucial for maintaining CI/CD pipelines for the Builders department. Perk utilizes a cloud-native architecture, ensuring scalability and reliability. Their modern services also embrace serverless solutions to enhance efficiency. Curious how this...
James Butherway

The AI Evolution of Graph Search at Netflix

2026-01-26 19:01
🔍 Netflix is evolving its Graph Search platform by integrating AI to enhance search capabilities. Natural language processing is now being used, allowing users to query in everyday language instead of complex structured queries. This shift aims to improve user experience and reduce friction in retrieving information. The first part of a three-part series details how Netflix is implementing and refining this AI-driven approach. Stay tuned for more updates! 🚀 #Netflix #AI #GraphSearch...
Netflix Technology Blog

Async Rust: Pinning demystified

2026-01-26 19:00
Async Rust is a complex topic, and the latest article, "Pinning Demystified," dives into its intricacies. It highlights how Rust's async engine uses a unique "pull-based" model, transforming async functions into lazy state machines. A key focus is on the concept of Pin, which prevents issues with moving self-referential structs during execution. Understanding Pin is crucial for maintaining safety in async Rust, especially with references that could become invalid if moved. This article is...
Anshul Gupta

How Rovo solves search challenges through entity linking

2026-01-26 18:00
Atlassian's article reveals how Rovo enhances search efficiency through entity linking. By transforming unstructured text into structured knowledge, Rovo improves tools like Rovo Chat and Search. This is achieved through Entity Linking, which connects textual mentions to specific records in the Knowledge Graph. The article discusses the challenges of linking team names, highlighting issues like inconsistency in naming conventions and operational noise. These challenges are crucial for...
Christopher Cheung

How resilient is HCP Vault during real AWS regional outages?

2026-01-26 17:00
On October 20, 2025, AWS us-east-1 faced significant disruptions. However, HCP Vault Dedicated showcased its resilience, maintaining 100% uptime for customer clusters during this event. 🌐🔒 While the control plane experienced some issues, the data plane continued to operate seamlessly. This incident validated our architectural design principles and highlighted the effectiveness of our operational procedures. We aim to share insights on building resilient cloud-native services. 💡📊...
Harini Murugan

Building AI Agents in Kotlin – Part 5: Teaching Agents to Forget

2026-01-26 16:09
In Part 5 of "Building AI Agents in Kotlin," the focus shifts to teaching AI agents to forget. Agents can run out of context, leading to task failures and data loss. The article discusses the performance of GPT-5 Codex and Claude Sonnet 4.5, noting their strengths and limitations in handling complex tasks. 🧠 A key solution proposed is smart compression, which retains essential context while dropping unnecessary details. This approach mimics how developers hand off tasks without overwhelming...
Fatimazahra El Akkary

Nomad on OpenShift: The case for the control plane

2026-01-26 15:46
🌐 Managing workloads at the edge has posed challenges for organizations, often forcing them to choose between Kubernetes or separate infrastructure. 🔍 Red Hat's recent OpenStack release introduces a new approach by deploying the control plane as Operator-managed containers on OpenShift while the data plane stays on external RHEL nodes. ⚙️ HashiCorp Nomad complements this by effectively managing lightweight edge devices, ensuring seamless operation even in intermittent connectivity scenarios....
Benjamin Holmes

Scaling Small LLMs with NVIDIA MPS

2026-01-26 15:30
Small language models are becoming increasingly effective for various enterprise applications. However, many GPUs remain underutilized during high-demand tasks. NVIDIA's Multi-Process Service (MPS) allows multiple inference processes to share a GPU, optimizing memory and compute operations. Testing has shown that MPS can significantly improve throughput for smaller models, especially those with short context lengths. This development highlights the importance of efficient GPU utilization in...

A simulation and evaluation flywheel to develop LLM chatbots at scale

2026-01-26 14:52
🚀 Exciting developments at DoorDash Support! To enhance customer and Dasher experience, we're transitioning from traditional decision trees to large language models (LLMs) for issue resolution. LLMs offer more flexibility and human-like interactions but introduce challenges in testing due to their non-deterministic nature. To address this, we've created a simulation and evaluation flywheel. This system includes an offline simulator that mimics real customer interactions, allowing us to test...
Lewis Warne

Understanding the recommender system's two-tower model

2026-01-26 14:23
🚀 Dive into the architecture of the recommender system's two-tower model with Red Hat OpenShift AI! This model simplifies the training pipeline by integrating with KFP and workflow managers like Argo Workflows. Key tasks include loading data, training models, and generating recommendations efficiently. Learn more about how engineers can optimize data sharing and pipeline performance. #AI #MachineLearning #RedHat #RecommenderSystems #OpenShiftAI
Hadar Cohen, Ori Fridman, Itay Katav, Ganesh Murthy, Peter Samouelian, Matan Talvi, Manna Kong