惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

S
Securelist
O
OpenAI News
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
Threat Research - Cisco Blogs
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Google Online Security Blog
Google Online Security Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
N
News and Events Feed by Topic
S
Security Affairs
SecWiki News
SecWiki News
Project Zero
Project Zero
L
Lohrmann on Cybersecurity
P
Proofpoint News Feed
P
Palo Alto Networks Blog
L
LINUX DO - 最新话题
H
Hacker News: Front Page
Recent Commits to openclaw:main
Recent Commits to openclaw:main
I
Intezer
Simon Willison's Weblog
Simon Willison's Weblog
W
WeLiveSecurity
T
The Exploit Database - CXSecurity.com
K
Kaspersky official blog
The GitHub Blog
The GitHub Blog
I
InfoQ
云风的 BLOG
云风的 BLOG
雷峰网
雷峰网
B
Blog
IT之家
IT之家
AWS News Blog
AWS News Blog
Jina AI
Jina AI
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Google DeepMind News
Google DeepMind News
Spread Privacy
Spread Privacy
N
News and Events Feed by Topic
Security Latest
Security Latest
美团技术团队
C
Check Point Blog
WordPress大学
WordPress大学
T
Tenable Blog
S
Security @ Cisco Blogs
Last Week in AI
Last Week in AI
博客园 - 聂微东
月光博客
月光博客
博客园 - 【当耐特】
S
Schneier on Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
Secure Thoughts
Schneier on Security
Schneier on Security
C
Cisco Blogs
Cyberwarzone
Cyberwarzone

Show HN

暂无文章

Quant Picker: Which GGUF File Should You Download?
Thomas Newkirk · 2026-06-13 · via Show HN

Pick your model and your machine — get the exact quant to download, the file size, and how much context you'll have left.

How to read the table

Every GGUF model ships in multiple quantization levels — same model, different precision, different file size. The trade is simple: more bits = better quality = bigger file = less room left for context. This tool does the arithmetic for your exact machine: file size per quant, then whatever memory remains becomes your context budget (the KV cache eats it per token).

The recommendation logic is the community consensus from our quantization guide: take the highest quant that still leaves ≥8k of context. Q6/Q5 are near-lossless, Q4_K_M is the sweet spot, and below Q3 quality falls off fast — if you're forced down there, you usually want a smaller model instead (a bigger model at Q4 beats a smaller one at Q8, but a Q2 of anything beats very little).

Honest limits

File sizes are computed from bits-per-weight, not scraped from Hugging Face — real files vary a little by quantizer version (K-quants vs I-quants, imatrix variants). The KV-cache math assumes a GQA-typical architecture; exotic models differ. And max context here is what fits — models also have their own context limits, and quality at extreme context is its own story. Treat the numbers as a reliable guide, not a contract.

Shopping rather than downloading? Can I run it? finds hardware that fits a model. Wondering if you should buy hardware at all? The cost calculator compares buying vs renting vs the API.