






















Tensormesh Inc. has hit upon a way to make artificial intelligence inference more efficient by eliminating the need for redundant computations, and its technology is so convincing that several of AI infrastructure giants are backing it with $20 million in funding.
Today’s round saw the participation of Nvidia Corp., Advanced Micro Devices Inc. and CoreWeave Inc., as well as the venture capital firms Valley Capital Partners and Laude Ventures. It brings Tensormesh’s total amount raised so far to $24.5 million, and it coincides with the launch of its flagship software-as-a-service offering, Tensormesh Inference.
Tensormesh’s technology is designed to tackle one of the most glaring inefficiencies of graphics processing units, which have to reprocess the same data over and over again given their limited memory caches. It’s a design challenge that stems from the way large language models work. Typically, LLM deployments treat each new request or prompt they receive as a brand new task. So even if an AI chatbot is engaged in a long-winded conversation with someone, or analyzing a document it has seen before, the GPU will need to reprocess the entire context window from scratch.
The startup aims to fix this by using a technique it calls key-value or KV caching. What this does is store the intermediate data generated by LLMs while processing a prompt.
Because it helps them to remember these computations, Tensormesh makes it possible to skip the reprocessing each time a new prompt arrives, enabling it to respond more quickly. For developers building agentic models that need to crunch their way through multiple steps to perform a task or solve a problem, it can result in a 10-fold reduction in latency and GPU spending.
The Tensormesh Inference, based on the open-source LMCache project, includes a cost savings dashboard that allows developers to track cache hit rates and convert them into tangible dollar figures. Moreover, it gives developers direct control over how much storage they allocate to the cache, so they can fine-tune their infrastructure to maximize efficiency based on the size of their LLM deployment and usage rates. According to the startup, some customers have achieved cache hit rates of more than 70%, meaning that more than two-thirds of all prompts are retrieved from the cache instead of recomputed.
Deployment is flexible, with three options available. Developers can use a serverless application programming interface that’s fully compatible with OpenAI Group PBC’s standards, enabling it to be dropped into existing workflows. Alternatively, for customers running more intensive workloads, the company offers on-demand deployment on dedicated GPU resources, or reserved deployments for enterprises that need custom service-level agreements.
Founder and Chief Executive Junchen Jiang said he’s not surprised that Nvidia, AMD and CoreWeave were among the first to understand the implications of his company’s technology. “Tensormesh offers a new vision on the significance of the intermediate data that LLMs generate when processing a prompt,” he said. “Behind the term KV cache is a whole concept of AI interpretation of the question it is asked. It’s a whole new class of data.”
Therein lies the potential of Tensormesh’s technology. It’s transforming “intermediate AI data” into an entirely new asset class, and this could become extremely valuable as AI agents become more complex. The more capable an AI agent is, the greater the context window required. By extending those context windows, Tensormesh could well emerge as a key piece of the agentic AI stack.
The money from today’s round will be used to expand Tensormesh’s hardware integrations with AMD’s, Nvidia’s and CoreWeave’s infrastructure and accelerate product development. The company also remains committed to the underlying, open-source LMCache project, which will be the main beneficiary of many of its planned upcoming innovations.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。