惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
Troy Hunt's Blog
GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 三生石上(FineUI控件)
罗磊的独立博客
Know Your Adversary
Know Your Adversary
Project Zero
Project Zero
G
GRAHAM CLULEY
T
Threatpost
T
Threat Research - Cisco Blogs
博客园 - 叶小钗
雷峰网
雷峰网
Hugging Face - Blog
Hugging Face - Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
IT之家
IT之家
月光博客
月光博客
C
CXSECURITY Database RSS Feed - CXSecurity.com
W
WeLiveSecurity
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
S
Schneier on Security
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
V
Visual Studio Blog
宝玉的分享
宝玉的分享
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Last Week in AI
Last Week in AI
T
Tenable Blog
V
V2EX
I
Intezer
T
Tailwind CSS Blog
博客园_首页
S
Security @ Cisco Blogs
量子位
PCI Perspectives
PCI Perspectives
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
D
Darknet – Hacking Tools, Hacker News & Cyber Security
人人都是产品经理
人人都是产品经理
SecWiki News
SecWiki News
小众软件
小众软件
Spread Privacy
Spread Privacy
D
DataBreaches.Net
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
Application and Cybersecurity Blog
Application and Cybersecurity Blog
C
CERT Recently Published Vulnerability Notes

AI demand is so high, AWS customers are trying to buy out its entire capacity | Network World

Cisco: Latest news and insights 2026 network outage report and internet health check Selector targets the network visibility gap in multi-cloud infrastructure Top network and data center events of 2026 How AI is transforming network incident response (and where it still falls short) Google opens TPUs to enterprises beyond its own cloud via Blackstone JV AI, cybersecurity skills top IT pay premiums Startup Bolt Graphics promises 5x performance over Nvidia’s best GPU Wireless security is a battle of AI vs. AI NetOps teams look to AI to automate Day 2 operations Digital twins reshape network and data center management Network outages, power failures strain data center resiliency Five takeaways from Cisco's blowout quarter and what it means to customers Cisco to cut nearly 4,000 jobs despite strong growth in AI, enterprise networking Startup SPAN teams with Nvidia to put data center nodes in your backyard Hard drive shortage affecting enterprise storage needs Wi-Fi 8 is closer than you think. Here’s what you need to know Cisco open-sources agentic AI security spec HPE revamps private cloud stack for enterprises rethinking VMware Versa takes aim at fragmented enterprise security with CSPM, orchestration update, and AI agent controls Red Hat opens Ansible to AI agents, within limits Red Hat offers endless Linux support — for a fee Red Hat: Sovereignty is more than just compliance Tech job postings hit three-year high as AI demand fuels hiring rebound HPE memory server targets compute-heavy and agentic AI workloads PCI group begins work on new spec to support bandwidth-hungry apps like AI, HPC Q&A: Quantum physicist Sonia Fernández-Vidal on why classical computing isn't going anywhere OpenAI-led consortium seeks to address AI processing bottlenecks AWS hit by US-East-1 outage after data center thermal event Gluware's Titan rises to meet Mythos network vulnerability challenge AMD launches AI-targeted PCIe cards for current servers Supply constraints, optical advances dominate Arista's Q1 Lumen advances cloud networking vision with $475M Alkira buy HPE bolsters autonomous network operations for Mist, Aruba Central Netskope launches AI agents for SOC and NOC automation Intel, behind in AI chips, bets on quantum and neuromorphic processors Switch storm coming: Gartner forecasts price hikes, long lead times for enterprise data center switches Extreme moves toward autonomous networking with advanced AI agent, management tools Broadcom bets big on VMware Cloud Foundation 9.1 IBM unveils its blueprint to help enterprises run AI at the core of their business Ruckus Networks on the move again, this time acquired by Belden for $1.85 billion AMD and Intel partner to deliver AI performance advancement Cisco grabs Astrix to secure AI agents Beyond the pitch: A look at Atlético Madrid's connected stadium StarlingX 12.0 is right on time for mixed-hardware edge deployments Cisco nerds out: May the Fourth be with your AI assistant Memory shortage and cost surge push enterprises toward the cloud Extreme Networks: Memory advantage, Wi-Fi 7 and competitive flux drive momentum Scenes from the great data center revolt Enterprise Spotlight: Transforming software development with AI When 170,000 people show up: Network refresh readies Churchill Downs for Kentucky Derby IT certification pay surges as noncertified skills slump QuEra claims quantum error correction breakthrough with 2-to-1 qubit ratio HPE expands ProLiant line with rugged edge servers Deconstructing the data center: A massive (and massively liberating) project Cisco bolsters security, AI support in latest SD-WAN release The era of chatbot AIOps is fading as agentic AI gains traction Auvik bets agentic AI can fill the networking skills gap AI data flows force rethink of data center networking at Backblaze Nvidia's 'AI insurance policy' balances immediate and future AI approaches Cirrascale to offer on-prem Google Gemini models Space data-center news: Roundup of extraterrestrial AI endeavors Network jobs watch: Hiring, skills and certification trends Cisco switch aimed at building practical quantum networks How AI is changing copper, fiber networking Almost 40% of data center projects will be late this year, 2027 looks no better It’s the end of set-and-forget security Google bets on workload-specific TPUs with 8t and 8i launch SUSE bets automated migration can break VMware's grip on virtualization How Zero Networks is closing the network enforcement gap for AI agents Cloudflare wants to rebuild the network for the age of AI agents AI fuels wireless talent shortage Broadcom's Facebook friend will help train it to accelerate AI workloads Data centers are costing local governments billions Equinix offering targets automated AI-centric network operations AI shifts IT roles from operator to orchestrator IBM unveils security services for thwarting agentic attacks, automating threat assessment Maine to put brakes on big data centers as AI expansion collides with power limits Satellite backhaul service Globalstar has a new, rich owner amid challenging market conditions DNS security is often inadequate, and network engineers should get more involved Curious about quantum? Check out training options from ISC2, IBM, AWS and more Cisco just made moves to own the AI infrastructure stack Data centers are moving inland, away from some traditional locations Fixing encryption isn't enough. Quantum developments put focus on authentication Intel: Latest news and insights Linux 7.0 debuts with some big changes for networking Intel secures Google cloud and AI infrastructure deal OpenAI puts part of Stargate project on hold over runaway power costs Broadcom strikes chip deals with Google, Anthropic Cisco to acquire Galileo for AI observability Neoclouds gain momentum in a supply-constrained world Lumen: Upstream network visibility is enterprise security's new front line Yael Nardi joins Minimus as Chief Business Officer to head growth strategy Nvidia Rubin GPUs may be delayed, slowing the next phase of AI infrastructure What is AI networking? How it adds intelligence to your infrastructure Google owns the most AI compute, and it built it its way Aria Networks raises $125M and debuts its approach for AI-optimized networks Intel bets on Terafab to help it reassert itself in the AI chip race New v2 UALink specification aims to catch up to NVLink Cisco joins Anthropic’s multivendor effort to secure AI software
Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK
By tether · 2026-06-17 · via AI demand is so high, AWS customers are trying to buy out its entire capacity | Network World

The latest release of qvac-fabric-llm.cpp, the inference engine of the QVAC Fabric LLM, features TurboQuant integration for resource management in long-running inference sessions. Tether adopts the technology as a path to better efficiency when running large

language models on devices with limited compute resources.

TurboQuant is Google’s response to the Key-Value (KV) Cache’s capacity expansion during routine inference, which can reach up to 8GB for a 262,000-token context session using a 4B-parameter large language model.

Tether takes the stage as the first AI research team to ship the KV Cache compression algorithm to a publicly available local AI model. The Turboquant integration will be included in the latest version of the QVAC SDK (v0.12.0) and will be available through Fabric, the inference and fine-tuning engine of the SDK.

This enables developers to serve intelligent models via the qvac-fabric-llm.cpp and compute inferences with negligible loss in precision, while consuming up to 5x less VRAM regardless of context size.

Why does this matter?

When you use an AI assistant, the model records the results from your previous prompts in temporary memory on your device. This record is known as the Key-Value Cache. The KV Cache is akin to a notepad for jotting down key points in a discussion or while reading a book.

For an AI model, it serves as a reference point each time you ask a follow-up question. This makes it easy for the model to follow up on your conversation without having to re-run the whole thread, which would waste a lot of time.

Transformer-based AI models build their KV Cache by storing the “key points” and their identifier, token-by-token (equivalent to “word for word”) in square grids. When you ask a follow-up question, the model sorts the key point using their precise location in the grid and computes new inferences based on your new input (prompt/question).

The KV Cache is a memory optimization technique and ensures a smooth run throughout each usage session. However, as you continue your conversation with the AI assistant, the KV Cache grows larger and consumes more memory on your device.

For instance, a few hours of conversation that grows into a 262,000-token session could consume up to 8GB of VRAM, which is hardly available in user-grade devices. Despite being temporary, KV cache overload could limit how an AI application can be used, especially when running models locally on devices with limited compute capacity, which is the case for the majority of users. KV Cache bloat is a major bottleneck for local AI, pushing users to cloud-based AI.

As a solution, TurboQuant strategically reduces KV Cache memory bloat by converting high-precision data vectors into lower-bit integers. Similar to reducing the size of the handwriting on the notepad or using signs instead of plain words, it shrinks the space the KV cache occupies.

How Turboquant compresses KV Cache memory

TurboQuant drastically reduces the memory consumed per cached token by using Polar quantization (PolarQuant) and Quantized Johnson-Lindenstrauss (QJL) techniques to bypass traditional quantization methods that require storing full-precision constants for small blocks of data.

It pairs PolarQuant’s structural efficiency with QJL’s zero-overhead error correction to compress caches up to 3 bits, delivering up to 5x improvement in memory management.

PolarQuant is what reduces the handwriting on the notepad or converts it into signs that consume less memory. It maps the KV Cache data onto a fixed circular grid and uses polar coordinates rather than the standard Cartesian (X, Y, Z) coordinates to locate key points.

This limits the details required to locate data to Angle (meaning of the data) and Radius (weight or importance of the data), rather than the full locational layout. It avoids the expensive data-normalization steps by replacing square grids with circular grids, simplifying vector representation and data location. This is similar to rewriting “Add 7 apples, then add 4 apples” as “Add 11 apples total.”

When compressing KV Cache data with PolarQuant, there is a risk of reducing the data’s weight score (importance rating). This is where the QJL comes in; it acts as a mathematical error-checker that corrects for a possible loss in attention score (the importance given to the data) during quantization. QJL uses signed bits (+1 or -1) to balance quantization errors. This way, it keeps the attention score perfectly (or nearly) accurate by balancing low-precision data with high-precision queries.

TurboQuant on QVAC SDK: More possibilities for Local AI

TurboQuant is a major breakthrough for local and cloud AI, but especially for local AI, where computing overhead is a major bottleneck for routine use. Tether recognizes the technological brilliance of the algorithm and its potential for models built to operate on tight resources. By compressing what would normally consume 8GB of VRAM down to 1.6GB, TurboQuant frees up resources for your inference machine, expands your bandwidth, and imagination of what can be done with a local superintelligent setup.

The TurboQuant integration via qvac-fabric-llm.cpp is supported by the Vulkan backend. This offers important compatibility and performance advantages attributable to Vulkan’s agnosticism and TurboQuant’s direct GPU execution.

Vulkan support bridges the advantages that TurboQuant offers to a wider range of user-grade devices and vendors outside the NVIDIA ecosystem (AMD and NVIDIA are currently supported, with mobile GPUs planned). It enables users and developers to run highly optimized, compressed local inferences on a wide range of platforms, including personal computers and mobile device GPUs.

TurboQuant’s KV Cache compression happens directly on the device’s GPU and aligns with how a computer naturally handles operations. This means the maths is done on the GPU’s fastest, closest memory, ensuring that models served with Fabric achieve the full 5x reduction in KV cache size while maintaining performance and precision. This lets users run much longer contexts (over 262,000 tokens) without running out of VRAM capacity.

TurboQuant lets you do more, with fewer resources, and in everyday environments. From simple follow-up queries to reviewing files that run in multiple gigabytes on your personal computers or mobile phones, it expands the scale of what can be done with an AI application. In QVAC SDK, it complements other optimization techniques inherent in Tether’s AI framework to power native intelligent systems that support an infinite number of users and autonomous agents. In a ten-billion-strong society, such systems will form a secure, viable, and unstoppable foundation for building the most complex superintelligent units for everyday use, biotechnology, and more.

From a macro perspective, compression techniques that reduce the operational resources required by AI models are the industry standard. The ability to develop and integrate such techniques will significantly impact the success of local AI models and infrastructure.

Tether is committed to building AI solutions that run on any setup and let choose their own biases. Follow the QVAC revolution and contribute to Tether’s drive for open source AI.