



























Lexar has been experimenting with various technologies to help consumers achieve faster data throughput and more reliable storage. However, the company is now envisioning something entirely different as the PC evolves from a regular personal computer to a local AI-enhanced experience. We had the opportunity to interview Lexar's Chief Technical Officer (CTO), Daniel Guo, about the technology Lexar is developing to help offload some of the DRAM demand to much cheaper NAND Flash. According to Guo, DRAM is about six times more expensive to manufacture than NAND Flash, and there are opportunities for AI SSDs to reduce the DRAM requirements for running AI models on local hardware. This is where the Lexar AI Storage Core SSD comes into play, as the company is creating new storage solutions for consumers to support local AI deployments using much less DRAM by offloading large language models (LLMs) to SSDs. This approach allows larger and more powerful LLMs to fit into a PC build, reducing memory footprint by at least 40%
Based on internal testing, Lexar managed to run the Qwen 3.5 122B AI model on a local PC. Traditionally, users would need to spend about $4,500 on a PC with a decent CPU and 128 GB of DRAM to run this model. Through hardware and software optimization, the Lexar AI suite with the Lexar AI Storage Core SSD can reduce the DRAM requirement to 32 GB and run the model with 35 billion parameters at 15.6 tokens per second, compared to only 5.2 tokens per second using traditional frameworks. When attempting to load the 122B model on 32 GB of DRAM, the traditional Llama.cpp fails to load and crashes, while Lexar's SSD offloading provides about 4.4 tokens per second.
When the system is equipped with a more robust configuration featuring 64 GB of DRAM, running the 122B model with a larger context window is only possible with SSD offloading. With about 4,000 tokens in context, both traditional configurations and the Lexar AI stack run at a slightly higher speed. However, for larger contexts, often needed at 256K tokens, only the Lexar AI suite can launch and manage to produce about 19.3 tokens per second. Of course, this doesn't mean the setup is perfect, and not every model size can be offloaded to the SSD. With larger LLMs, system latency increases significantly, as the time between submitting a prompt and receiving a response grows exponentially.
The time to first token, often called TTFM, has been measured at about two seconds before the first token appears after the prompt is submitted with a 2K context window. When the context is larger at 4K, the delay increases to anywhere between 6 and 8 seconds. Technically, users could offload models that are about 400 billion parameters large, but the tokens per second and TTFM would be very slow. For some, this might be suitable, but for others, buying more DRAM is the better solution. Either way, this is an intriguing concept from Lexar.
Example from Computex 2026.
The company developed a concept for Mini-PCs and desktops featuring an M.2 slot designed for multiple insertions. An M.2 SSD is encased in a metal jacket (not a full enclosure) and is inserted into a 25 mm-wide slot on the front panel of a mini PC, connecting directly to the M.2 slot wired to the processor or chipset. This design eliminates other overheads. The hot-swappable SSD, which offloads AI models onto NAND Flash, reduces dependency on DRAM and aids in running larger models. It is available in both PCIe Gen 5 and Gen 4 versions, with the Gen 5 version offering more bandwidth. This M.2 SSD uses Lexar's custom Storage Processing Unit (SPU) DRAM-less controller for complete control over data movement.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。