惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 三生石上(FineUI控件)
T
Threat Research - Cisco Blogs
月光博客
月光博客
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
爱范儿
爱范儿
Hugging Face - Blog
Hugging Face - Blog
腾讯CDC
云风的 BLOG
云风的 BLOG
D
Docker
罗磊的独立博客
U
Unit 42
博客园 - 聂微东
人人都是产品经理
人人都是产品经理
P
Proofpoint News Feed
博客园 - Franky
Apple Machine Learning Research
Apple Machine Learning Research
MyScale Blog
MyScale Blog
B
Blog RSS Feed
美团技术团队
J
Java Code Geeks
S
Securelist
Cyberwarzone
Cyberwarzone
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
NISL@THU
NISL@THU
Security Latest
Security Latest
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Recorded Future
Recorded Future
Hacker News - Newest:
Hacker News - Newest: "LLM"
L
LINUX DO - 热门话题
Recent Announcements
Recent Announcements
Last Week in AI
Last Week in AI
A
About on SuperTechFans
MongoDB | Blog
MongoDB | Blog
Spread Privacy
Spread Privacy
T
Tenable Blog
I
Intezer
N
News | PayPal Newsroom
大猫的无限游戏
大猫的无限游戏
A
Arctic Wolf
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
V
V2EX - 技术
S
Schneier on Security
S
SegmentFault 最新的问题
Latest news
Latest news
宝玉的分享
宝玉的分享
V
Visual Studio Blog
V
V2EX
T
Tor Project blog
C
Comments on: Blog

Hacker News

Folding Beijing - Uncanny Magazine Restartable Sequences The solution might be cancelling my AI subscription Frost: [Browser] Fingerprinting Remotely Using OPFS-Based SSD Timing [pdf] Cloudflare Turnstile requiring fingerprintable WebGL Human brains are misaligned, hallucinative, stochastic parrots. Let’s finally build Biological General Intelligence You Weren't Meant to Have a Boss Show HN: Atomic Editor – 适用于 CodeMirror 6 的 Obsidian 风格实时预览 United Airlines 767 Returns to Newark After Bluetooth Name Sparks Alert Backpressure is all you need Let dav2d be — Jean-Baptiste Kempf The dangerous delusion of modern warfare London's Free Roof Terraces The Website Specification Telli (YC F24) is hiring in engineering, design, and GTM [Berlin, on-site] GitHub - kurikomi-labs/komi-learn: Continuous memory + self-improvement for AI agents. Learns how you work, recalls it automatically, no commands. Claude Code & Codex. A pictorial introduction to differential geometry, leading to Maxwell's equations as three pictures Ahoy, DECmate II! the little PDP-8 that could AV2 Specification Anyone can build a platform now. Almost nobody can get people to find it Mechanical Pencil 86Box v6.0 Cheese Paper Shantell Sans → A font for you GitHub - arunkatherashala/Kore Interfaces › Design Engineering Magazine wolfSSL releases a new product; wolfCOSE a zero alloc C embbedded COSE stack Domain Expertise Has Always Been the Real Moat GitHub - marekkowalczyk/breathe-cli: Paced resonance breathing in your terminal Dusklight • Restoring light to a classic adventure A disappearing Service Processor | Oxide Computer Company GitHub - rfi-irfos/rusty-penguin: Rusty Penguin — Binary hardware. Ternary mind. A ternary-first OS in Rust. omen.ops — Joseon court observability Corporations are tracking your emotions and there's nothing you can do about it | Tony Rice Hormuz crisis side effect: a sharp rise in container shipping rates Microsoft Office 2019 and 2021 for Mac view-only conversion (2026) - Consumer Rights Wiki OpenRouter Raises $113M Series B | OpenRouter Microcode inside the Intel 8087 floating-point chip: register exchange Accenture to Acquire Ookla to Strengthen Network Intelligence and Experience with Data and AI For Enterprises Meta is reportedly developing an AI pendant Ask HN: What Is the State of App Development in 2026? A Probabilistic Algorithm for Repairing All Roads in Lebanon via Papal Visits Voxel Space Memory decline after menopause linked to loss of estrogen production in brain tissue voyagecoat.com Anthropic surpasses OpenAI to become world’s most valuable AI startup Please Do Not Vibe Fuck Up This Software AMD Customer Community I Put a Datacenter GPU in My Gaming PC for £200 Helios. Is plug-in solar worth it? Openrsync: An implementation of rsync, by the OpenBSD team pandoc-templates.org 'Mind-blowing': Iron-rich immune cells help homing pigeons navigate Danish pension fund excludes SpaceX citing governance and valuation Company accidentally blows $500M on Claude AI in one month OpenRCT2 v0.5.1 “Swamp Castle" released! Perry — TypeScript → Native Parallel Reconstruction of Lawful TLS Wiretapping What Is a Dickover? The Office of Management and Budget tries again to cripple US science MCP is dead | Quandri Engineering FreeCal — calendars for your organisation Free full BGP feed. IPv4 and IPv6 The White House’s Aliens.gov Site Brags That ICE Arrested More Than 700 US Citizens Trillion Characters The Last Technical Interview The California State Assembly Has Passed the 'Protect Our Games Act' GitHub - jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM I Tested Whether AI Can Fix Security Vulnerabilities. Well, It's Complicated. On Rendering Diffs EV Stupidity Checklist Thiel moves family to Milei’s libertarian Argentina Current Rothko AI will be used to estimate age of asylum seekers from next year SQLite is All You Need for Durable Workflows - Blog lpcvoid.com Records show UC sharing data with US Customs and Border Protection Rsync maintainer starts uses Claude, regressions mount TV Explorer — 10,000 Free TV Channels Notes from the Mistral AI Now Summit in Paris GTA 6 Developers Unionize bijou64 It Will Never Be the Year of the Linux Desktop · unix.foo I Am Retiring from Tech to Live Offline Blue Origin rocket explodes on launchpad in a setback The "Stars" of Titanic (2012) Headway Therapy Patients Forced to Scan Their Faces to Keep Getting Care Racket v9.2 It's hard to justify buying a Framework 12 Please Use AI Expertise in the Age of AI Stateless Actors Poisonous invasion: What is the 'devil's trumpet' harming crops in Iraq? Step 3.7 Flash — A high-efficiency Flash model for Real-World Canada slipped into a technical recession on an annualized basis as economic growth stalled in 1st quarter local git remotes — alexander cobleigh Poll: How often do you check "newest"? We should be more tired than the model High Density Living, 2000 Years Ago: Inside the Roman Apartment Building Danish Pension Fund Blacklists SpaceX, Citing Governance Issues
Introducing 1-bit and Ternary Bonsai Image 4B: Image Generation for Local Devices
PrismML · 2026-05-26 · via Hacker News

Images generated from Ternary Bonsai Image 4B

Today we’re releasing Bonsai Image 4B, a family of compact image-generation models designed to run high-quality diffusion inference on local hardware: from laptops to phones.

Bonsai Image 4B comes in two variants:

  • 1-bit Bonsai Image 4B uses binary {−1, +1} transformer weights with an FP16 group-wise scaling factor, giving 1.125 effective bits per weight. It targets maximum compression and is the right fit when memory pressure, bandwidth, and the deployment footprint are the primary constraints.
  • Ternary Bonsai Image 4B uses {−1, 0, +1} transformer weights with an FP16 group-wise scaling factor, giving 1.71 effective bits per weight. The additional zero state gives the model more representational flexibility, improving visual quality and prompt fidelity while remaining extremely compact.

The result is a new deployment regime for image generation: capable outputs, open weights, and practical local inference on devices that were previously out of reach for this class of model. To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

Built for local generation

Images generated from 1-bit Bonsai Image 4B

Local image generation starts with a hard constraint: the model has to fit within the device’s memory budget.

For a 4B-class image model, the diffusion transformer is the largest part of the model and the part that runs repeatedly during generation. Each denoising step invokes the transformer again, so transformer size directly shapes memory pressure, bandwidth demand, and local inference speed.

Bonsai Image 4B is built from the FLUX.2 Klein 4B. It keeps the architecture intact but changes how the transformer weights are represented. By moving those weights into binary and ternary form, Bonsai reduces the part of the image pipeline that matters most for local deployment.

Model Diffusion Transformer Reduction vs FP16
FLUX.2 Klein 4B 7.75 GB 1.0x
1-bit Bonsai Image 4B 0.93 GB 8.3x
Ternary Bonsai Image 4B 1.21 GB 6.4x
Table I: Diffusion transformer footprint for models.

The binary layers provide roughly a 14x reduction relative to full-precision transformer weights. A small set of precision-sensitive supporting tensors (~5%), called the projection layers, remains in FP16 so the final 1-bit Bonsai Image 4B transformer is 0.93 GB: an 8.3x reduction from the 7.75 GB full-precision FLUX.2 Klein 4B. 

The ternary variant follows the same structure. Its ternary layers provide roughly a 10x reduction and the final Ternary Bonsai Image 4B transformer is 1.21 GB, a 6.4x reduction from the full-precision transformer. It is slightly larger than the 1-bit model, but the additional zero state improves visual quality and prompt fidelity.

Including the compressed text encoder and FP16 VAE, the Apple Silicon deployment payload is 3.42 GB for 1-bit Bonsai Image 4B and 3.88 GB for Ternary Bonsai Image 4B. For comparison, the full precision FLUX.2 Klein 4B requires a deployment payload of 15.97 GB. Since, at runtime, the text encoder is offloaded after prompt encoding, the mean memory usage is smaller than the total payload. When generating a 512x512 image, the mean-active memory is 1.5 GB and 1.96 GB, for the binary and ternary models, compared to 11.74 GB for the original FLUX.2 Klein 4B (a reduction of 7.8x and 6.0x, respectively). For a 1024x1024 image, the mean-active memory is 1.95 GB and 2.38 GB, for the binary and ternary models, compared to 14.39 GB for the original FLUX.2 Klein 4B (a reduction of 7.4x and 6.0x, respectively).

This reduction in memory footprint changes where the model can run. Our deployment stack supports Apple Silicon iPhones, iPads and Macs and CUDA GPUs, using MLX low-bit paths on Apple hardware and Gemlite low-bit GEMM kernels on CUDA. On iPhone 17 Pro Max, the full-precision FLUX.2 Klein 4B pipeline does not fit within the device memory budget, while both Bonsai Image variants run on-device.

Video I: Image generation on Bonsai Studio

In practice, Bonsai Image 4B generates a 512x512 image in 9.4 seconds on an iPhone 17 Pro Max and about 6 seconds on Mac M4 Pro. On Mac M4 Pro, Bonsai Image 4B is up to 5.6x faster than the stock full-precision MFLUX pipeline.

Benchmarking performance

Compression only matters if the model remains useful. We evaluated Bonsai Image 4B across three complementary benchmarks: GenEval for object composition and attribute binding; HPSv3 human preference and aesthetic quality; DPG-Bench dense prompt following and semantic faithfulness.

Qualitative comparison across Bonsai Image and FLUX.2 Klein 4B models.
Model Diffusion
Transformer
Footprint (GB)
GenEval HPSv3 DPG-Bench Size reduction
relative to
FLUX.2 Klein 4B
Performance
relative to
FLUX.2 Klein 4B
1-bit Bonsai Image 4B 0.93 0.671 11.15 0.822 8.3x 88%
Ternary Bonsai Image 4B 1.21 0.723 12.22 0.851 6.4x 95%
FLUX.2 Klein 4B 7.75 0.819 12.84 0.853 1x 100%
SDXL 5.14 0.3 10.05 0.74 1.5x 67%
BK-SDM-Small 0.98 0.297 3.05 0.559 7.9x 42%
Stable Diffusion 1.5 1.72 0.396 4.2 0.601 4.5x 51%
PixArt-Σ XL 2 1.2 0.541 11.93 0.769 6.4x 83%
Table II: Image quality benchmark comparison across Ternary Bonsai Image 4B and other models.

Ternary Bonsai Image 4B is the quality-oriented variant. At 1.21 GB, it retains 95% of the FLUX.2 Klein 4B accuracy across GenEval, HPSv3, and DPG-Bench, while reducing the diffusion transformer footprint by 6.4x.

1-bit Bonsai Image 4B is the footprint-oriented variant. It brings the diffusion transformer below 1 GB, an 8.3x reduction, while still delivering strong benchmark scores across the same three evaluations (it retains 88% of the accuracy of FLUX.2 Klein 4B).

Together, the two variants move the quality–footprint frontier. Bonsai Image remains competitive with modern 4B-class image models while using a fraction of their diffusion-transformer footprint. At the same time, it substantially outperforms smaller models with similar memory footprints. That is the same Pareto shift we have seen in our prior Bonsai language models. Bonsai Image brings modern diffusion-transformer behavior into a memory range that previously belonged to much smaller, lower-capability models.

Why this is important

Image generation is not only a model-quality problem. It is also a deployment problem.

Cloud APIs will continue to be the right choice for many products. But cloud-only generation imposes certain product constraints: every prompt is a remote request, every iteration carries marginal serving cost, and every interaction adds round-trip latency.

That matters because image generation is naturally iterative. Users rarely stop at one image. They revise prompts, compare outputs, generate variations, discard failures, and try again. When each attempt is a server-side job, the creative loop becomes something users have to meter and wait for.

Local inference changes that. Once the model fits on the device, generation can sit directly inside the product experience. It becomes cheaper to run, faster to iterate on, and easier to use in environments where prompts, and generated assets should remain private.

Bonsai Image 4B is a step toward that deployment regime: capable image generation running closer to the user, on hardware they already own.

Images generated from Ternary Bonsai Image 4B

Availability

Both 1-bit and Ternary Bonsai Image 4B will be released with open weights and code under the Apache 2.0 license.

With this launch, we are also launching Bonsai Studio, its iOS app for trying Bonsai Image 4B directly on iPhone.

Join Us

PrismML emerged from a team of Caltech researchers and was founded with support from Khosla Ventures, Cerberus and Google. We’ve spent years tackling one of the field’s hardest problems: compressing neural networks without sacrificing their reasoning ability.

If you want to help build the next generation of state-of-the-art AI, we’d love to hear from you. Check out our careers page.

Resources