惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

量子位
C
CXSECURITY Database RSS Feed - CXSecurity.com
Project Zero
Project Zero
O
OpenAI News
C
Cisco Blogs
Microsoft Azure Blog
Microsoft Azure Blog
Security Latest
Security Latest
T
Tor Project blog
S
SegmentFault 最新的问题
P
Privacy & Cybersecurity Law Blog
博客园 - 【当耐特】
V
Vulnerabilities – Threatpost
W
WeLiveSecurity
小众软件
小众软件
博客园 - 聂微东
Y
Y Combinator Blog
Spread Privacy
Spread Privacy
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
Scott Helme
Scott Helme
B
Blog RSS Feed
N
News | PayPal Newsroom
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
TaoSecurity Blog
TaoSecurity Blog
D
Docker
阮一峰的网络日志
阮一峰的网络日志
NISL@THU
NISL@THU
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
L
LINUX DO - 最新话题
MongoDB | Blog
MongoDB | Blog
Recorded Future
Recorded Future
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
L
LangChain Blog
Cloudbric
Cloudbric
罗磊的独立博客
宝玉的分享
宝玉的分享
Jina AI
Jina AI
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
N
News and Events Feed by Topic
GbyAI
GbyAI
大猫的无限游戏
大猫的无限游戏
A
About on SuperTechFans
L
LINUX DO - 热门话题
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC

Hacker News: Front Page

Trump administration reclassifies cannabis as less dangerous Release raylib v6.0 · raysan5/raylib GitHub - russellromney/honker: SQLite extension + bindings for Postgres NOTIFY/LISTEN semantics with durable queues, streams, pub/sub, and scheduler Writing a C Compiler, in Zig crawshaw - 2026-04-22 MacBook Neo and How the iPad Should Be Convergent Evolution: How Different Language Models Learn Similar Number Representations It's time to reclaim the word "Palantir" for J.R.R. Tolkien Arch Linux now has a bit-for-bit reproducible Docker image Fundamental Theorem of Calculus | David Álvarez Rosa | Personal Website Bring Your Agent to Teams Ars Technica newsroom AI policy France confirms data breach at government agency that manages citizens’ IDs New study compares growing corn for energy to solar production. It's no contest NAEP Long-Term Trend Assessment Results: Reading and Mathematics We found a stable Firefox identifier linking all your private Tor identities GitHub - besimple-oss/broccoli: Broccoli turns Linear tickets into shipped PRs — powered by Claude and Codex, running on your own Google Cloud. Youth Suicides Declined After Creation of National Hotline Top MAGA influencer revealed to be AI — created by a guy in India who made a mint off lonely men online Ping-pong robot beats top-level human players Announcing DuckDB 1.5.2 The handmade beauty of Machine Age data visualizations Treetops glowing during storms captured on film for first time Columnar Storage is Normalization TPU 8t and TPU 8i technical deep dive Our eighth generation TPUs: two chips for the agentic era Introducing Google Cloud Fraud Defense, the next evolution of reCAPTCHA Kernel code removals driven by LLM-created security reports tante.cc Nobody Got Fired for Uber's $8 Million Ledger Mistake? Introducing workspace agents in ChatGPT Sure, xor’ing a register with itself is the idiom for zeroing it out, but why not sub? What Async Promised and What it Delivered — Causality GitHub - justrach/kuri: Browser automation and web crawling for AI agents. Zig-native, token-efficient CDP snapshots, HAR recording, and a standalone fetcher. Drunk Post: Things I’ve Learned as a Senior Engineer Claude Code to be removed from Anthropic's Pro plan? Another Day Has Come 'Something sinister could be happening': FBI looks into dead or missing nuclear and space defense scientists tied to NASA, Blue Origin, and SpaceX | Fortune GitHub - calcom/cal.diy: Scheduling infrastructure for absolutely everyone. Meta to start capturing employee mouse movements, keystrokes for AI training The Vercel Breach: OAuth Supply Chain Attack Exposes the Hidden Risk in Platform Environment Variables Member of Technical Staff, Product Engineering (full-time) at Trellis AI | Y Combinator CATL's new LFP battery can charge from 10 to 98% in less than 7 minutes Jobs at Bloom | Y Combinator The printing press for biological data (Sterling Hooten) Brussels launched an age checking app. Hackers took 2 minutes to break it Inside GitHub's Fake Star Economy The Illuminated Man by Christopher Priest and Nina Allan review – an unconventional portrait of JG Ballard IEA: Solar overtakes all energy sources in a major global first Stripe’s payments APIs: The first 10 years GitHub - esutcu/planb-lpm GitHub - browser-use/browser-harness: Self-healing browser harness that enables LLMs to complete any task. Claude Token Counter, now with model comparisons GitHub - shivampkumar/trellis-mac Six levels of dark mode The Bromine Chokepoint: How Strife in the Middle East Could Halt Production of the World’s Memory Chips Turtle WoW classic server announces shutdown after Blizzard wins injunction Scoring 500 Show HN pages for AI design patterns Vercel April 2026 security incident | Vercel Knowledge Base Dubai police arrest airline worker after accessing private WhatsApp group Prompt → Diagram — Gemma 4 E2B in desktop Chrome (WebGPU) Binary GCD - Algorithmica madhadron - The seven programming ur-languages Keep Pushing: We Get 10 More Days to Reform Section 702 The world in which IPv6 was a good design Zero-Copy GPU Inference from WebAssembly on Apple Silicon The RAM shortage could last years Any Color You Like: NIST Scientists Create ‘Any Wavelength’ Lasers in Tiny Circuits for Light Optimizing Ruby Path Methods A college instructor turns to typewriters to curb AI-written work and teach life lessons UpCodes | Careers The electromechanical angle computer inside the B-52 bomber's star tracker Why Japan has such good railways - Works in Progress Magazine State of Kdenlive - 2026 GitHub - smol-machines/smolvm: Tool to build & run portable, lightweight, self-contained virtual machines. Head of Engineering at Kyber | Y Combinator GitHub - paniclock/paniclock: Instantly disable Touch ID and lock your Mac with one click or keyboard shortcut. Detecting DOSBox from within the Box I Measured Claude 4.7's New Tokenizer. Here's What It Costs You. Introducing Claude Design by Anthropic Labs Middle schooler finds coin from Troy in Berlin It Is Time to Ban the Sale of Precise Geolocation Isaac Asimov: The Last Question Teddy Roosevelt and Abraham Lincoln in the same photo Healthchecks.io Now Uses Self-hosted Object Storage Bluesky has been dealing with a DDoS attack for nearly a full day. Harness Engineer at Substrate | Y Combinator GitHub - dacracot/Klondike3-Simulator SPICE simulation → oscilloscope → verification with Claude Code — Lucas Gerads Email could have been X.400 times better Newly unsealed records reveal Amazon’s price-fixing tactics, California attorney general claims GitHub - GainSec/AutoProber: Hardware hacker’s flying probe automation stack for agent-driven target discovery, microscope mapping, safety-monitored CNC motion, probe review, and controlled pin probing. A Better R Programming Experience Thanks to Tree-sitter SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents Clojure - Documentary GPT‑Rosalind for life sciences research How a Tiny Yellow Handheld Changed How Duke University Teaches Game Design - Playdate News Android CLI and skills: Build Android apps 3x faster using any agent Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7 Codex for almost everything
Banning noise will be a disaster for statistical data products - Ted is writing things
Damien Desfontaines · 2026-06-13 · via Hacker News: Front Page

Last week, the United States Department of Commerce issued an order declaring that "noise infusion" will be banned from all statistical products published by the Census Bureau and the Bureau of Economic Analysis.

A screenshot of the order mentioned in the article. It reads: a. The
Department shall, as a primary objective, aim to fulfill its statistical
obligations by providing the public with accurate and objective information. b.
The Department is firmly committed to striking a balance of accuracy,
confidentiality, objectivity, and relevance for each statistical product that is
consistent with its statistical obligations and the applicable legal
requirements. c. Any use of noise infusion is inconsistent with the Department’s
policies. 02 The Census Bureau and the Bureau of Economic Analysis shall adhere
to the following order of priority when considering and applying Disclosure
Avoidance: a. Coarsening shall be the preferred category of Disclosure Avoidance
methods for all statistical products. b. Suppression shall be permitted as a
last resort, only to be used when coarsening is prohibited by law or would
substantially defeat the accuracy or usability of a statistical product. c.
Noise infusion shall not be used for any statistical
product.

What does it mean, and why should you care?

Context

Statistical products are a bunch of numbers published from a secret dataset. Often, that dataset contains confidential information, and it is important that the numbers don't reveal that information. The U.S. Census is a well-known example: the statistics are made public, but the contents of each form filled by individual U.S. residents must stay secret.

Scientists have developed a number of techniques that can be used to publish useful statistics while protecting the privacy of the original data. This field is called disclosure avoidance in statistical communities. Here are a few of these techniques.

  • Suppression: removing data that doesn't pass certain thresholds (e.g. if a count of people is below 5, we don't publish it).
  • Coarsening (or generalization): making data attributes less precise (e.g. transform a county into its state, a date of birth into an age range, etc.).
  • Sampling: randomly removing some records from the dataset.
  • Swapping: taking attributes from different records and exchanging them randomly.
  • Contribution bounding: making sure that a single individual cannot contribute "too much" to a statistic by limiting their maximum impact.
  • Noise addition: adding a random number to statistics to hide their true value.

Some of these techniques, when combined, achieve a definition called differential privacy. This definition has a lot of nice fundamental properties and is widely considered the gold standard of privacy protection among scientists. To achieve it, scientists typically rely on a combination of contribution bounding and carefully-calibrated noise addition.

From 1990 to 2010, the U.S. Census Bureau primarily relied on swapping for the decennial census. Then, they realized that this technique was actually very unsafe, and that it was pretty easy to reconstruct individual records using the published statistics. This is bad, because the Bureau is required by federal law to keep these records confidential. So they tried a few alternative approaches, and decided to adopt differential privacy for the 2020 Census: this was the one that kept the statistics most useful, while preventing these attacks.

It bears repeating: differential privacy wasn't chosen because the math was nice and compelling1. It was selected because among the different options that mitigated the attack, it was the one that preserved the most utility. Its exact privacy parameters were chosen not because they provided rock-solid provable guarantees, but because they squeezed most usefulness out of the data while reaching an acceptable level of privacy protection.

Sadly, "preserved the most utility under newly-discovered privacy constraints" did not mean "preserved as much utility as the 2010 Census": the numbers got less accurate, and the inaccuracies got a lot more transparent, and therefore impossible to ignore. This made a number of people very angry.

  • Demographers and social scientists could no longer ignore that the data they were working with was noisy data. This required a major shift in how they conceptualized and worked with this data.
  • People who were using Census data to actually reconstruct records could no longer do so. Demographers admitted that this was common practice. It's also an open secret that this was done by political operatives as part of gerrymandering efforts.

Phew, that was a lot of context.

What does the order say?

The administration has now decided that noise infusion was no longer an acceptable disclosure avoidance technique.

The order clearly targets differential privacy, but also seems to impact other techniques that involve randomness: the text explicitly mentions that coarsening should always be preferred, falling back to suppression as a "last resort". I have no idea why the order is so specific. Maybe they wanted to make sure the scientists working at the U.S. Census couldn't still use similar techniques without calling them differential privacy?

The order also carefully says it "shall not be interpreted to conflict with any constitutional, statutory, regulatory, or other legal provision". So the confidentiality obligations surrounding these statistical products still apply.

What will it mean in practice?

The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe.

For starters, taking away useful tools from the disclosure avoidance toolbox will always lead to more painful privacy/utility trade-offs. The whole point of this research field is to better understand and quantify privacy risk, and develop better tools to mitigate this risk while preserving utility.

For statistical releases, differential privacy is simply the best tool we have right now. It provides a finer way of quantifying trade-offs, and allows us to get more utility out of the data than competing techniques at similar privacy levels. If you take it away, you're left with techniques that either have worse utility at similar levels of privacy, or worse privacy for the same utility.

But all competing techniques also rely on noise addition. The Cell Key method, used at other statistical agencies, adds noise to statistics. Swapping, used from 1990 to 2010 for the U.S. Census, also injects randomness into the process. Sampling is everywhere in statistical work2. Hell, even imputation technically adds noise to the data3!

By contrast, coarsening and suppression are very blunt instruments. They only work in situations where the statistics are already very coarse, and not too many of them are published. For complex data products with many statistics about small groups of people (like the U.S. Census), they either destroy all utility of the data (especially for minority populations), or are very vulnerable to privacy attacks.

It makes sense: privacy attacks on statistical releases are about solving a system of equations. It is such an easier task when you know for sure that the statistics are all perfectly accurate. Noise forces you to compute probabilities, quantify the uncertainty, carefully consider baselines, and so on. That's why randomness is such a useful tool for disclosure avoidance! Even without formal guarantees, it makes attakcs a lot harder. Take it away and attacks become trivial.

Why is it happening?

I mean, who knows.

Maybe the goal is to force the U.S. Census to publish statistics that actually enable re-identification, to help with future gerrymandering efforts? Or on the contrary, maybe the idea is to stop the publication of useful demographic data, to prevent researchers from showing unfair disparities among the population?

Hanlon's razor provides an alternative explanation. The fundamental privacy/utility trade-off inherent to statistical data releases is annoying. It would be a lot easier if publishing many statistics didn't automatically come with a high privacy risk. Differential privacy makes this trade-off explicit, and thus impossible to ignore. Maybe banning it is a way of pretending that the problem doesn't exist, in the hope that it will go away?


Thanks to Adam Sealfon, Aloni Cohen, Ben Jacobsen, and Gautam Kamath for helpful comments on earlier drafts of this post.