Majestic Labs Raises $100M for Memory Pooling AI Server

Server architecture will offer up to 100 TB of DRAM per accelerator.
LOS ALTOS, Calif. — AI chip startup Majestic Labs is working on a memory-pooled server design for AI inference that will offer as much as 100 TB of DRAM per accelerator, far beyond what can be achieved with HBM today. The company has raised $100 million in Series A funding for its chip and system design, which can pack the memory capacity and bandwidth of 10 racks of state-of-the-art GPUs into a single server, Majestic Labs co-founder and president Sha Rabii told EE Times.
Majestic Labs was founded in 2023 by longtime colleagues Masumi Reynders, Ofer Shacham, and Sha Rabii after a long history of working together in the silicon divisions of Google and, more recently, Meta.
“We spent a lot of time thinking about the opportunities—not just where AI is today, but extrapolating, looking at where it was, where it is, where it’s likely to be, and trying to shoot ahead,” Rabii said. “Because we knew from the get-go that it was foolish to try and play Nvidia’s game and just out-execute them.”
From left to right: Majestic Labs co-founders Sha Rabii, Ofer Shacham, and Masumi Reynders (Source: Majestic Labs)
The founders’ observation was that compute was growing faster than memory bandwidth, while the inference of most large models was memory bandwidth-limited. They anticipated that models would continue to grow in size and that longer context lengths would be required, Rabii said.
“[We knew] people would want the quality of results you get from top-of-the-line models, but the economics would be very challenging,” he said.
The economics of compute-first architectures—starting with a highly performant compute element and fitting as much HBM as possible around it—are based on sub-optimal compute-to-memory ratios, Rabii said.
“We decided to come up with a technology that disaggregates memory from compute, so you can scale memory independently of compute,” he said. “The big challenge is connecting the memory and compute through an extremely high bandwidth and low latency interface that can compete with HBM.”
HBM connects a relatively small amount of memory to compute at high bandwidth, and CXL can connect large memories with low bandwidth, but neither fully meets the requirements of AI, Rabii said.
Majestic’s memory-first architecture tackles the issue with memory pooling. Since this requires extremely high-bandwidth I/O, the team started working on ways to implement the physical layer, protocols, and software layer immediately, as well as considering how to manage reliability and robustness against failures.
Two dies
Majestic is working on two pieces of silicon—a memory interface chiplet which will sit next to both compute and memory, and a many-core AI acceleration chip.
Majestic’s memory pool design uses over 100 TB of standard LPDDR in a server alongside up to 12 of Majestic’s AI accelerator chips. The memory pool will use loose coherency with proprietary mechanisms for flow control and atomic operations, and striping schemes will be used to fully utilize the available bandwidth.
The entire memory space is accessed by an AI compute chip as a single contiguous flat memory space, with each compute chip having the same bandwidth and latency to every location in that memory space.
“That really simplifies programming,” Rabii said.
GPU-based servers tend to have many tiers of memories (local HBM, HBM on other GPUs, host LPDDR, etc.), which makes optimizing performance a complex software task.
“There are entire companies whose reason for existence is to help other companies map their workloads more effectively and efficiently to GPU clusters,” Rabii said. “We think that is an unnecessary task… [we need to] build the infrastructure in a way that is usable.”
Developers don’t want to have to learn anything about how hardware works, Rabii said, though they inadvertently carry implicit assumptions about how compute and memory operate to their coding. For this reason, Majestic wants to try to maintain a compute and memory mindset, rather than going with a novel accelerator design.
Majestic’s accelerator is fully programmable with a large array of CPU cores and matrix multiplication accelerators. The startup has licensed accelerator IP from a third party, which is building a custom version of its core for Majestic’s accelerator. Crucially, this third party is also able to supply the compiler and low-level software for its IP.Since AI workloads are not compute-bound, Rabii said, the accelerator design is less critical than the memory interface and system architecture.
“It isn’t really about the hardware—the success of servers has as much to do with how quickly people can ramp up and use them and how robust your toolchain is, as how close it gets to the limits of [performance],” Rabii said. “I have Google and Meta to thank for really making me internalize that.”
Today, Majestic’s software stack can take HuggingFace models and lower them to executable code that can run on a software simulation of its server. The company is leaning toward open-source software projects such as Triton and vLLM, Rabii said.
High bandwidth
The success of Majestic’s server will depend on the bandwidth it can achieve between its accelerators and off-package DRAM.
The actual memory in HBM is the same as what’s in LPDDR, Rabii pointed out, but HBM gets the bandwidth by stacking lots of memory dies and aggregating their bandwidths through a proprietary interface.
“We do something analogous to that but on a different scale,” he said. “We take groups of LPDDR chips, aggregate their bandwidths, and connect them through a proprietary very high-speed interface to our compute chips.”
The LPDDR is mounted on boards using off-the-shelf technology, Rabii said, in order to bring the server to market quickly.
Not using HBM helps both economics and the supply chain for Majestic, he added.
Flexibility
One of the key advantages of Majestic’s architecture is that it can offer flexible compute-to-memory ratios. The Majestic server can be equipped with between one and 12 compute chips, and memory can scale from 8 to 128 TB. More compute cards can be added after deployment to change the ratio, Rabii said.
“If you wanted to build a data center out of entirely Majestic servers, you could configure some for prefill with high compute and modest memory, and do the converse for decode,” he said. “But we’re agnostic—if a customer wants to use Nvidia for prefill and Majestic for decode, [they can].”
Target customers include hyperscalers, neoclouds, and large enterprises, especially those in high-frequency trading.
The company already has had multiple customers place significant orders for its servers, Rabii said.
“Everyone, whether they admit it or not, is focused on the cost of running AI models,” he said. “That’s where we have a significant advantage.”
Today’s GPU-based systems often over-specify the number of GPUs purely to increase the amount of memory available, which results in low GPU utilization.
“This would mean buying a lot of very expensive silicon you don’t really need, which also uses a lot more power,” he said. “One of our servers supports many more users than a GPU server; that’s a big cost advantage.”
The company currently has a team of 40 split between Los Altos, Calif., and Tel Aviv, Israel. Both Majestic’s AI compute chip and its memory interface chiplet will tape out this year. Servers will begin shipping to lead customers in 2027.
Read also:
Delos Data wants to enable practical scale-up domains of 1000+ GPUs in flexible topology designs.
RELATED TOPICS:AI, AI ACCELERATOR, AI INFERENCE, DATA CENTERS, MEMORY, MEMORY POOLING, SEMICONDUCTORS, STARTUPS
COMPANIES:MAJESTIC LABS
_Sally Ward-Foxton covers AI for EETimes.com and EETimes Europe magazine. Sally has spent the last 18 years writing about the electronics industry from London. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more news publications. She holds a Masters' degree in Electrical and Electronic Engineering from the University of Cambridge._Follow Sally on LinkedIn
0 comments
