Ethernet-based Memory Fabric unveiled by Enfabrica, a potential game-changer in AI inference at a large scale.
In the world of AI, memory has become the star of the show. Generative AI models are growing more complex, context-aware, and persistent, requiring vast amounts of memory per user session. To address this challenge, Silicon Valley startup Enfabrica has unveiled a groundbreaking product called the Elastic Memory Fabric System (EMFASYS).
What is a Memory Fabric?
A Memory Fabric is an architecture that enables elastic, scalable, and high-bandwidth memory access across multiple compute nodes. It achieves this by combining two powerful technologies: RDMA over Ethernet and Compute Express Link (CXL). This approach transforms memory into a shared, distributed resource, allowing AI inference workloads to scale more efficiently without being shackled by the physical memory limits of any single node.
The Benefits of EMFASYS
EMFASYS sets the stage for more resilient AI clouds, where workloads can be distributed elastically across a rack or an entire data center without rigid memory limitations. Inference can scale without compromise, as resources are no longer stranded, and the economics of deploying large language models finally begin to make sense.
Here are some key benefits of EMFASYS:
- Expanding available memory capacity: Large AI models and workloads demand more memory than what GPU High-Bandwidth Memory (HBM) can economically provide. EMFASYS supplies extra DDR5 memory off-node that is reachable at low latency, thus accommodating these larger demands.
- Improving compute resource utilization: By offloading memory from GPU HBM to an external memory pool, GPUs are less likely to stall waiting for memory and avoid wasted expensive GPU memory resources. This leads to better GPU utilization and overall system efficiency.
- Enabling elastic bandwidth and capacity: Memory fabric systems like EMFASYS adapt memory bandwidth and capacity dynamically across racks and servers, which supports scaling AI inference workloads elastically, handling spikes without overprovisioning.
- Reducing costs per AI token generated: By balancing memory demands and optimizing token generation workload distribution, memory fabrics can reduce infrastructure costs significantly—by up to 50% per token in some reported cases—thus improving the economics of running large AI models at scale.
A Novel Approach to Decoupling Memory from Compute
The heart of EMFASYS is the ACF-S chip, a 3.2 terabits-per-second (Tbps) "SuperNIC". This device fuses networking and memory control into a single device, allowing servers to interface with massive pools of commodity DDR5 DRAM. EMFASYS uses standard Ethernet ports, allowing operators to leverage their existing data center infrastructure without investing in proprietary interconnects.
EMFASYS: A Key Enabler in the Next Generation of AI Infrastructure
EMFASYS is currently sampling with select customers, and major AI cloud providers are already piloting the system. Reuters reports that these cloud providers are positioning Enfabrica as a key enabler in the next generation of AI infrastructure.
In essence, EMFASYS is a commercially available Ethernet-based memory fabric system designed to address the core bottleneck of generative AI inference: memory access. It delivers a novel approach to decoupling memory from compute, allowing AI data centers to improve performance, lower costs, and increase GPU utilization. With EMFASYS, memory is no longer a supporting actor in AI systems; it is the stage, and Enfabrica is betting that whoever builds the best stage will define the performance of AI for years to come.
Data-and-cloud-computing technology plays a crucial role in the implementation of EMFASYS, as it is designed to be deployed in AI cloud environments. This technology enables AI workloads to scale efficiently across racks and data centers, addressing the memory challenges faced by generative AI models.
Furthermore, technology advancements, such as the ACF-S chip in EMFASYS, decouple memory from compute, transforming memory into a shared, distributed resource. This technology breakthrough is a significant step forward in data-and-cloud-computing, revolutionizing the way AI systems handle memory access and overall performance.