惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

I
Intezer
V
Vulnerabilities – Threatpost
Google Online Security Blog
Google Online Security Blog
T
The Exploit Database - CXSecurity.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
AWS News Blog
AWS News Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
C
Cybersecurity and Infrastructure Security Agency CISA
N
News | PayPal Newsroom
T
Tenable Blog
Spread Privacy
Spread Privacy
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
S
Secure Thoughts
P
Privacy International News Feed
IT之家
IT之家
Project Zero
Project Zero
T
The Blog of Author Tim Ferriss
Engineering at Meta
Engineering at Meta
大猫的无限游戏
大猫的无限游戏
博客园_首页
GbyAI
GbyAI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
量子位
雷峰网
雷峰网
Apple Machine Learning Research
Apple Machine Learning Research
Hacker News: Ask HN
Hacker News: Ask HN
Google DeepMind News
Google DeepMind News
MongoDB | Blog
MongoDB | Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
NISL@THU
NISL@THU
I
InfoQ
D
DataBreaches.Net
有赞技术团队
有赞技术团队
K
Kaspersky official blog
Security Latest
Security Latest
The Register - Security
The Register - Security
Hugging Face - Blog
Hugging Face - Blog
S
Security @ Cisco Blogs
P
Proofpoint News Feed
M
MIT News - Artificial intelligence
H
Hackread – Cybersecurity News, Data Breaches, AI and More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
AI
AI
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
P
Proofpoint News Feed
Security Archives - TechRepublic
Security Archives - TechRepublic
N
News and Events Feed by Topic

HN's home page

Rainbow Query Language | Hacker News Exec into Node via Kubectl An AI native hedge fund The Seven-Action Documentation Model | Hacker News Package Manager for Kubectl Plugins Tongan Castaways | Hacker News Tech overlords plan for conscious AI to conquer the cosmos. What could go wrong? Data Breach Disclosure Lag Is Getting Worse How LLMs Work | Hacker News I Dropped PRDs for Shape Up Go Experiments Explained | Hacker News FCA's Palantir deal could expose UK financial data to Trump's US, critics fear WebXR BCI for Neural-Adaptive Avatar Control in Mixed Reality The first murder conviction via DNA analysis Tom Interviews Theo de Raadt of the OpenBSD Project (2019) [video] Show HN: Replace shell commands with bun shell typescript scripts Quay.io Is Down | Hacker News AI driven analysis of brokerage account fees in the UK Bill Gates Spent Years Crafting His Image. Now It's Cracking Using LLMs to secure source code Wi-Fi 8 in the Lab [video] The household battery revolution that could change energy bills and the world Is Python Becoming Pinyin? | Hacker News Livia – Executive Assistant | Hacker News FindMyPipe – Query Apple Find My from Linux for AI Agents Show HN: Agent skill for creating product launch videos with Remotion RecruitMyself – AI job search copilot for resumes and applications AI coding agents and the erosion of system understanding The 'Resting' Generation and South Korea's Youth Recession AMD Computex 2026: 10 Years of AM4, AM5 Support Through 2029 Docker Networking Explained | Hacker News Textbooks in Tokenland | Hacker News Key Chemistry Question Answered, No Quantum Computer Required Gifts For Retrocomputing Fans – remix yesterday's tech with a modern spin Miscellany № 49: introducing the quasiquote – Shady Characters Amazon Thinks the Future of Data Centers Is a Technical Problem It Just Solved A brief history of the UUID (2017) Flying High Unpressurized (2016) | Hacker News Five Years of Trying to Add Recursion to Lychee How British comfort food won over the French Blorp Language | Hacker News Decache – you might have the internet's lost media in your PC's cache folders Criminal Activities and Migration | Hacker News A free, open-source library of DESIGN.md files for AI-generated UIs MiniMax M3 | Hacker News People are apparently farming citations on ResearchGate – Chuniversiteit Hacker News Basketeer – a typed TS SDK for your Tesco account, with nutrition data 'Penguin' decays from CERN's Large Hadron Collider experiment hint new physics Emergence World: A Laboratory for Evaluating Long-Horizon Agent Autonomy Homebrew lead Mike McQuaid: Sandboxes and Worktrees - My Secure Agentic AI Setup Lean, Not Backpressure | Hacker News AI Dangers Eclipse Nuclear Weapons at Singapore Defense Forum Open source analytics that answers backbase How turkey hacked the hair-transplant industry How GPT Image 2 Is Transforming Marketing Workflows in 2026 Improve Git monorepo performance with a file system monitor Strava for Claude Code MiniMax M3 on Qubrid AI There's Something Else We Should Be Worrying About Celebrity Profile of an A.I. Actress What Is Windows K2? | Hacker News AI is devoid of meaning and humanity. Its vapid voice suits the political moment Show HN: Interpreto – Live Translation for Travel Taxicab Geometry Sealed classes and interfaces in Java (2025) Show HNs | Hacker News My AI Skill Edited This Video That Explains My AI Skill – Arcturus Labs Amazon Pinpoint End of Support The Mystery of the Backward Index MP/M's Process Dispatcher SlimTide Reviews: A Modern Solution for Metabolism and Energy Learning Lustre: Type-safe front end development with gleam Thomas Mann: Goethe Heartened by Panama (As Suez for English, or Danube-Rhine) How to make Message Log of the Unreal Engine 100 times faster Sum-product, unit distances, and number fields Can Meta Buy Belief? | Hacker News Twenty Years of Bigtable | Hacker News Show HN: Combine WigglyPaint GIFs into Video Show HN: AgentThreatBench – Benchmark for AI Agent Memory Security Genius Spotted in the Wild Napkins: Where Ethernet, Compaq and Facebook’s cool data center got their starts (2011) Moderate caffein use alters sleep-related EEG Nvidia Announces RTX Spark | Hacker News Show HN: Ministry of Everything – CLI agent harness for a single operator CEOs blame AI for layoffs, MIT prof says it fits a pattern to find cover story Bugs I didn't expect while building a zsh cleanup script for macOS dev machines Nvidia jumps into PCs with new chip debuting in laptops from Microsoft, Dell, HP Nvidia unveils PC 'superchip' in challenge to Apple and Intel Show HN: Having fun making mini static site apps Synthea API: Create Synthetic Medical Records as a Service Berkshire Hathaway to buy Taylor Morrison for $6.8B in cash The most complex model we understand [video] SanDisk stock is +4,440.53% in the past year Driftwm: What if your window manager worked like a whiteboard? US Immigration enforcement looks into buying ad data AI Is Creating More Work for Australia's Workplace Tribunal Finding New Biblical Cross-References with Codex Glide: A tiling window manager for macOS Ultra-highly efficient enrichment of uranium from seawater via studtite nanodots (2024)
Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s
yolo-auto · 2026-06-14 · via HN's home page

This submission is a tale about how I launched an unlimited LLM provider to about 60 hyped people on the waitlist, then immediately served them a fully dysfunctional death-loop model, and how most people, very reasonably, disappeared, but thanks to a few extremely nice people stuck around anyway, we kept the project alive and its still pretty chaotic but gaining traction.

To back up a little bit-- I believe that the whole point of AI agents is that they should keep working. They should read files, retry, search, code, summarize, run tools, and loop until the job is done. When your employer is paying for it, who cares about cost, but when it comes to my personal money/hobbies, if every loop feels like a tiny financial event, you start babysitting the agent instead of using it, and its not fun.

On the other hand, metered pricing makes me worry about using too much. Usage subscriptions make me feel like I need to use every last magical % or I'm are "wasting it". If only an unlimited provider existed....

Then I joined the AMD developer program - I got some credits to spin up my own MI300x and started tinkering with vllm/sglang inference serving on AMD.

After learning about AMD MI300x , i did some napkin math:

Renting MI300x at 2.00 an hour = ~$1500 a month . It can probably support about 150 users using a small MOE model, like qwen-35b-3a , maybe more.

1500 / 150= $10.00 per month, and we all get to play with agents for a small price.

You can oversubscribe a bit, so i landed on $6 per month, per user, for 2x generation slots, 128k context, no token limits, no rate limits.

I built the site, router, made a waitlist, and then over-optimized the MI300x to the point where vllm bench had like 3k+ output and 40k+ throughput.... But i didn't test the final config/serve commands... And that's where i did a disaster launch. You couldn't prompt the thing without it looping or bugging out, it was cursed. And that's where we lost alot of people.

Luckily, my buddy had a few 3090s, so he threw me a life boat and began hosting qwen for us on 2x 3090s and we finally had an operational model that wasn't costing $2.00 an hour for our whopping 3 users.

We started gaining a more users, so we moved up to 4x 3090s. Which we have plenty of room for more users, but even so, since then:

we've configured vllm wrong like 15 times a GPU died we lost power I made a bunch of one-click starts for openclaw,hermes,pi-mono and none of them really work right and that probably drives people away. Those are still on our site right now.

...but people that know what they are doing seem to really be liking the price point. All in all we have like 98% up time. Its been about a month. We've both learned a ton, even already having backgrounds in SWE/SE/AI , being on the hook for a couple paying users forced us to really focus on delivering them a good product. And now i think we might be close to paying the power/hosting bill so we're not operating at a loss (if u include 3090 capex were still at aloss).

Our break-even point is moving to the cloud to max out a MI300x, which is now tuned and ready to go once we get the users.

And im finding in some areas, subscribing to our service is cheaper than running the model (but as someone who loves local models, i totally get it).

Since then, I've been working on a desktop agent that actually works with small models like qwen -- thats going to replace the broken 1 click starts. It's barebones, but its something out of the box that just works. I made it open source, you can see what im talking about here: https://github.com/yolo-auto-org/yolo-auto-desktop , we're at yolo-auto.com and we have an abysmal free tier to prove it works!

Anyway, hope you got a laugh or found it interesting! Drop a question if you have any.