Tech

The Split Path of SDN and RDMA Networks

Over the past decade, Software-Defined Networking (SDN) emerged as a game-changer, becoming the go-to standard for data center networking. Its rise aligned perfectly with the needs of cloud-scale operations, offering substantial gains in efficiency. SDN delivered two key benefits: enhanced control and greater flexibility. By separating network control from data forwarding, it allowed operators to program the network dynamically while abstracting the underlying hardware from applications and services. This programmability sharpened control, and the abstraction unlocked flexible implementation, driving faster feature rollouts and predictive management in data centers.

Meanwhile, Remote Direct Memory Access (RDMA) networks carved out a niche in smaller-scale deployments, primarily in storage and high-performance computing (HPC). These setups were modest compared to the sprawling infrastructures where SDN thrived. Early attempts to scale RDMA for cloud environments—seen in pioneers like Microsoft Azure and Amazon Web Services (AWS)—hit roadblocks tied to control and flexibility, though each tackled the hurdles differently, as we’ll explore later.

Now, in the current decade, the explosive growth of AI—particularly GPU-driven backend or scale-out networking—has thrust RDMA into the spotlight. The surge in Generative AI (GenAI) and Large Language Models (LLMs) has unleashed new demands: higher performance, scalability, and unique traffic patterns. LLM training, for instance, generates sporadic data bursts with low entropy and intense network use, maxing out modern RDMA NICs at 400 Gbps. These patterns throw off traditional Equal-Cost Multi-Path (ECMP) load balancing, a staple of SDN-based data centers, pushing operators to rethink their designs.

Data Center Networks: From Scale Up to Scale Out to Scale Outside

In 2024, the data center network landscape underwent rapid transformation, driven by the demands of generative artificial intelligence (AI) workloads. From the initial spotlight on ChatGPT to models like Gemini, Grok, and domestic offerings such as DeepSeek, Doubao, and Tongyi Qianwen, the development of large-scale AI models has surged, pushing the market’s demand for AI-specific data center computing power to unprecedented heights. Major players have ramped up investments to align with this trend:

Microsoft plans to invest $80 billion in 2025 to build data centers, primarily for AI applications.
Meta is committing $60 billion to $65 billion, mainly for data centers and servers, representing a 60% to 70% increase over 2024.
AWS has pledged $11 billion to support AI and cloud technology infrastructure, including launching the Rainier project to create a supercluster with hundreds of thousands of Trainium chips to serve clients like Anthropic.

These massive investments are fundamentally reshaping traditional data center network architectures, as the demands of AI training and inference push networks to new performance limits. The industry is innovating with Scale Up and Scale Out network solutions to meet these challenges.

At the chip level, companies like Broadcom and Marvell provide foundational technologies for these connections. Within racks, NVIDIA’s proprietary NVLink interconnect protocol competes with emerging open standards like UALink, while at the Scale Out level, InfiniBand and Ethernet solutions continue to evolve to meet AI workload needs. The Ultra Ethernet Consortium (UEC) and its UET protocol signal a strong industry push toward open standards.

Moreover, a key architectural shift is emerging: the basic unit of AI computing is moving from individual servers to integrated rack-scale systems, exemplified by NVIDIA’s GB200 NVL72 platform and AWS’s Trainium2 UltraServer. The network vendor landscape is also rapidly evolving, with companies racing to develop new architectures optimized for AI workloads.

Large Language Models and Slim-Llama: A Breakthrough in Energy-Efficient AI Deployment

Large language models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their power demands—stemming from substantial computational overhead and frequent external memory access—severely hinder their scalability and deployment, especially in power-constrained environments like edge devices. This increases operational costs while limiting the accessibility of LLMs, necessitating the design of energy-efficient methods to handle models with billions of parameters.

Current Approaches and Their Limitations

Current methods to reduce the computational and memory demands of LLMs are typically based on either general-purpose processors or GPUs, incorporating techniques such as weight quantization and sparsity-aware optimization. These approaches have proven relatively successful in some respects but still heavily rely on external memory, incurring significant energy overhead and failing to deliver the low-latency performance required by many real-time applications. Such methods are ill-suited for resource-constrained or sustainable AI systems.

VxWorks Multiple Task Programming

Introduction

VxWorks, as a high-performance real-time operating system (RTOS), is widely utilized in embedded fields such as aerospace, industrial control, and automotive electronics. One of its core strengths lies in its robust support for multi-tasking, enabling developers to design applications where multiple tasks run concurrently, improving system efficiency and responsiveness. This article introduces the basics of multi-task programming in VxWorks, including task creation, management, and synchronization, with practical examples to guide developers.

What is Multi-Task Programming?

In VxWorks, a "task" is an independent unit of execution within a program, similar to a thread in other operating systems. Multi-task programming allows multiple tasks to share CPU resources under the VxWorks kernel’s scheduling mechanism. Each task has its own stack, priority, and state (e.g., running, ready, suspended), and the kernel manages their execution based on priority-based preemptive scheduling. This is particularly valuable in real-time systems where timing and resource allocation are critical.

Creating a Task in VxWorks

A Step by Step Guide on UART Programming for VxWorks 7

Introduction

UART (Universal Asynchronous Receiver/Transmitter) programming is essential for embedded systems, providing a simple, cost-effective way to achieve serial communication. VxWorks 7, renowned for its real-time capabilities, supports UART programming, making it an excellent choice for applications requiring deterministic behavior. This article will guide you through the process of UART programming under VxWorks 7, from setup to debugging.

Understanding UART

VxWorks 7, like many real-time operating systems, treats hardware peripherals such as UART as file descriptors. This abstraction allows developers to interact with hardware using familiar file I/O system calls, simplifying the programming model.

Prerequisites

VxWorks 7 Development Environment: Ensure you have the VxWorks 7 SDK installed, which includes Wind River Workbench for development.
Target Hardware: A board or simulator that supports VxWorks 7 with a UART interface.
Basic Knowledge: Understanding of C programming and familiarity with VxWorks concepts like tasks, semaphores, and interrupts.

Sidebar

The Split Path of SDN and RDMA Networks

Data Center Networks: From Scale Up to Scale Out to Scale Outside

Large Language Models and Slim-Llama: A Breakthrough in Energy-Efficient AI Deployment

Current Approaches and Their Limitations

VxWorks Multiple Task Programming

VxWorks Multiple Task Programming

Introduction

What is Multi-Task Programming?

Creating a Task in VxWorks

A Step by Step Guide on UART Programming for VxWorks 7

Introduction

Understanding UART

Prerequisites

More Articles …