Vericoding: The End of "Trust Me Bro, The AI Wrote It".

Hacker News - Newest: "AI"

Bone Keeper AI Assisted Feature Film – Barrett Sonntag Nuance in all things. A dive into (Anti-) “AI” Myths AgentGate — Trust Authorization for Autonomous AI Agents AI is learning to fly airplanes – and aviation is starting to embrace it GitHub - oldrich-research/gravitational-constant-relation: A high-precision phenomenological relation for Newton's gravitational constant: G = (4/3)(hbar c / m_e^2) alpha^21 exp(-5 alpha/2). Companion to Zenodo DOI 10.5281/zenodo.20120946. Research performed by AI agents under named author's direction. AI agents just got their own web browser via a Firefox fork AI poses "urgent threat" to student learning and the HSC The AI Bifurcation of Tech The largest study of AI use by undergrads is in, revealing disparities in access — and in cheating NZ at wild frontier of AI superhacking The Race Is On Google CEO Sundar Pichai says booing graduates will shape AI's future Show HN: TalkTimer, a micro-SaaS run by an AI agent team Trickster's Table Venture Capitalist John Doerr Says AI Is the Biggest Tech 'Tsunami' AI Can’t Care – Dan Moore! GitHub - peterxcli/ccost: Turn local AI coding session logs into a searchable terminal UI with a cost lens. Ask HN: What is your daily AI stack? GitHub - PanzerPeter/Neuro: A programing language for AI Resyl: AI Memory for People - Apps on Google Play AI Chip Component Costs: Memory at 63% | Epoch AI Ask HN: Why do people seem to generally hate AI? Resonance, randomness, and negotiated meaning for AI-assisted tarot divination GitHub - Kind-Computers/quinlight-audio: Audophile-quality MOD music with AI remastering at 32-bit 96 kHz! The Case Against the AI Job Apocalypse AI and the Rise of Just-In-Time Knowledge Work Careers After AI There Is No AI (It's Just People), with Jaron Lanier [video] wolfram-fb0 — AI writes x86_64 asm + eBPF for fractals, in a real VM in your browser Bursting the AI Bubble: Fed Could Take Away the "Who Could Have Known?" Defense AI proves mathematicians wrong I built a free AI travel planner for budget Europe trips Our AI just got even better Integral Intelligence: a Catholic view of the AI debate How to Tame AI’s Voracious Appetite for Energy GitHub - atveit/pi-mojo: A mojo port of the PI AI Agent Toolkit Autotrader – paper trading AI agent for Indian equities The invisible fabric of AI: chips are not a war between two, but a global fabric - zoopa.es Responsible Work with AI The AI Existential Crisis: Western AI Agents Will Win Commerce Legal Ontologies for AI This AI Stock Is the Ultimate Set-It-and-Forget-It Buy for Long-Term Investors AI wealth must benefit the public, South Korea's deputy PM says amid Samsung labor tensions Forget electrons, this breakthrough uses light-matter particles to power AI State Explosion Security Problem in AI-Era Software Supply Chains ShannonBase: The Lightweight Semantic Layer for Enterprise AI SQL AI Content Got Too Real. Now OpenAI and Nvidia Are Using Google’s Watermarking System. - Firethering Karen Hao: AI creating a DESPERATE BASE OF WORKERS with no full-time employment GitHub - barvhaim/llm-learning-path: 🎓 Structured LLM Learning Path — From Zero to Researcher. 8-phase curriculum covering Transformers, pre-training, fine-tuning, alignment, agents, and advanced research. Letting Agents Write Code Without Ratcheting Up Risk Why Every Electronic Product May Need To Be Rebuilt For On-Device AI: The Chip Layer Will Decide The Next Hardware Wave – Easelink Tech Ask HN: I mapped 6,494 AI engines into a taxonomy – anyone else tried this? China behind in LLM race but it can still win in AI, ex-Tencent AI lead says Newsom signs order aimed at tackling AI job displacement How AI is redefining Software Engineering Hiro, AI job matching with real visa sponsorship data (550K jobs) For developers without design skills, how do you leverage AI for front end dev? The Anatomy of AI Power in 2026 | Wayne Research arxiv ‘AI washing’: firms are scrambling to rebrand themselves as tech-focused Clawd Cursor v0.9.7 SpaceX, OpenAI and Anthropic IPOs set to test limits of AI boom Export chats from 11 AI platforms to PDF or Markdown locally From Vibe Coding to AI-Assisted Engineering: Lessons from Real Projects Shannon Got AI This Far. Kolmogorov Shows Where It Stops GitHub - machineswillrise/jagent: AI coding agent in Java GitHub - anatomia-dev/anatomia: Verified AI development. Ship with proof. Joe Rogan accidentally exposed AI in four words [video][12 mins] AI Headshot Generator for Work | Preview before you pay $4.99 one-time, no subscription MAXTOKEN A Unified Framework for Unbounded Output Generation and Repository-Scale Code Understanding The unlikely Vatican-Anthropic relationship that's reshaping AI ethics debate Fashion designer Jeremy Scott gets a huge cheer after ripping up his AI-written commencement speech Sycophantic AI decreases prosocial intentions and promotes dependence GitHub - anasmohiuddinsyed-bit/ai-fix: When a command fails, one word fixes it. AI-powered error fixer for your terminal. AI Governance 2026: I Almost Quit over This Shit (and Why You Might Too) GitHub - sabir-gbs/the-polyglot-protocol: A senior-engineer protocol for polyglot code generation, architecture, testing, security, performance, and agent validation. CodeShot — Beautiful Code Screenshots via API Apple Preparing New 'Gen AI' Website Ahead of WWDC Duolingo's CEO says he backtracked on evaluating AI use in performance reviews AgentLens — Know if your AI features are actually working How Much AI Compute Do Frontier Labs Use? AIBTC Jimmy Carr on Why Everyone Is Wrong About AI [video] Ask HN: Why disparage AI while attributing ideas to AI when written with it? Mercor AI post-training for generalist experts – referral Reid AI - Silicon Valley Video Summit AI Translate All Formats 2026-05-23 — Why reviewing AI-generated code is devilishly hard Verytis AI Model Idle · 인공지능 키우기 GitHub - narekmalk/safedb-mcp: Secure MCP server for safe, read-only DB access by AI agents, with SQL guardrails, table allowlists, PII masking, and audit logs The 5% who operate AI are pulling ahead. Find out where you stand. | PeraByte Labs ORBIT AI Chess Coach — Know Why Every Move Matters AgenticVBench GitHub - ninjahawk/singleton-attractor: Why one dominant intelligence is the inevitable long-run outcome in any competitive recursive-improvement environment. Ask HN: How can you have fun doing corporate dev work in the age of AI tools? AI Mistakes Are Infuriating Gamers as Developers Seek Savings Polish Nobel literature laureate Tokarczuk sparks controversy after admitting using AI The AI Slot Machine is Draining My Creativity

_doctor_love · 2026-05-25 · via Hacker News - Newest: "AI"

Vibe coding won. That debate is over. GitHub says 46% of all new code is AI-generated. Among Y Combinator's Winter 2025 cohort, 21% of startups have codebases that are 91% or more AI-generated. Gartner forecasts 60% of all new software code will be AI-generated by end of 2026. The machines are writing the code. Congratulations.

Now the bad news.

AI co-authored code has 2.74x more security vulnerabilities than human-written code. 45% of it fails security tests. Vibe-coded projects accumulate technical debt 3x faster. METR ran a randomized controlled trial with experienced developers and found they were 19% slower with AI tools, while believing they were 24% faster. 95% of developers report feeling productive while measurably producing lower-quality code.

AWS themselves have said it plainly: review capacity, not developer output, is the bottleneck in 2026. We can generate code at machine speed. We cannot verify it at machine speed. That asymmetry is the whole problem.

And it's already exploding. The Tea App breach exposed 72,000 government IDs because a database was just… open. The Orchids platform had a zero-click vulnerability that sat unfixed for months. Security experts are predicting "catastrophic explosions" as vibe-coded applications hit production at scale. Not might happen. Are happening.

The software industry automated code generation but left verification in 2019. That's like building a car factory that runs at 10,000 units per hour and staffing the quality inspection line with two guys and a clipboard.

Vibe Coding vs. Vericoding

The term vericoding was introduced by researchers at the Beneficial AI Foundation in late 2025. Where vibe coding generates potentially buggy code from natural language descriptions, vericoding generates formally verified code from formal specifications. Code that is mathematically proven correct. Not tested. Not probably correct. Proven.

The difference is not incremental. It is categorical.

A test suite says: "I tried 1,000 inputs and none of them broke." Formal verification says: "I have a mathematical proof that no input, out of the infinite space of all possible inputs, can break this." Tests demonstrate the presence of bugs. Proofs demonstrate their absence.

The vericoding benchmark tested 12,504 formal specifications across three verification languages: Dafny, Verus/Rust, and Lean. Using off-the-shelf LLMs with no special training or fine-tuning, vericoding success rates hit 82% in Dafny. Pure Dafny verification improved from 68% to 96% over the past year. The curve is steep and accelerating.

Dafny is a verification-aware programming language created by K. Rustan M. Leino, now at AWS's Automated Reasoning Group. It compiles through Boogie into SMT-LIB2 queries that are checked by Z3, an SMT solver. If Z3 says UNSAT, your code is proven correct against its specification. Not probably. Proven.

AWS uses this exact stack to verify their authorization engine, the one handling over a billion API calls per second. They verified Cedar, their authorization policy language, in Dafny and proved it satisfies security properties with mathematical certainty. Then they used differential testing against quadrillions of production authorizations to confirm the Rust implementation matches. The result? A 65% performance improvement and provable security.

This is not academic technology looking for a problem. This is production infrastructure running at planetary scale.

The Gap Everyone Sees, Nobody Has Closed

Every vericoding system today, including the research from AWS's own DafnyPro and ATLAS teams, assumes you already have a formal specification. Someone has to write:

requires amount <= balance
ensures balance == old(balance) - amount
ensures balance >= 0

That someone is currently a verification expert. There are maybe a few thousand people on Earth who can write Dafny specifications fluently. That's not a market, that's a mailing list!

The pipeline academia is building looks like this:

Formal Spec → LLM → Verified Code → Production

DafnyPro hits 86% on this. ATLAS generates thousands of verified programs. The vericoding benchmark shows 82% success. All impressive. All starting from a formal spec that a human expert already wrote.

The pipeline we have starts earlier, where the actual problem is:

Natural Language → Formal Spec → Human Review → LLM → Verified Code → Production

That first step, natural language to formal spec, is the reason vericoding hasn't become a product. It's the hardest translation, the one the researchers explicitly scope out of their papers. The vericoding benchmark authors said it directly: "We acknowledge that spec generation is an important problem, but focus on the task of generating implementations and formal proofs in this work."

We're building the part they left out. And we have an unfair advantage.

We Already Do This. It's Called PreFlight.

If you've been following this blog, you know ICME's PreFlight. You write a guardrail policy in plain English. PreFlight compiles it to formal logic (SMT-LIB). An SMT solver checks every agent action against that logic in under a second. Every decision produces a cryptographic proof: tamper-proof, independently verifiable, shareable with any third party. SAT means allowed. UNSAT means blocked.

That's the same fundamental pipeline. Natural language in. Formal logic out. Verified by an SMT solver. Cryptographic receipt attached. The only difference is what you're verifying. PreFlight verifies agent actions against a policy. Vericoding verifies code against a specification. The core machinery is identical: translate human intent into math, then let the solver do its job.

PreFlight already handles the hard part that every academic vericoding paper punts on. The NL-to-formal-logic translation. The ambiguity detection. The sub-second verification. The cryptographic proof that the check actually ran. These aren't theoretical capabilities. They're in production, running against real agent actions, right now.

Why PreFlight Is a Vericoding Engine in Disguise

Here's the part that makes this more than a research direction for us.

Automated Reasoning's core job is translating natural language into formal logic. That translation is the same fundamental problem whether you're compiling a guardrail policy or generating a code specification. "Never approve transactions over $10,000 without a second confirmation" compiles to guardrail logic. "No withdrawal can exceed the account balance" compiles to a Dafny precondition. The domain shifts, but the translation pattern is identical: vague human intent into precise formal constraint.

The system we've built handles every stage of this translation. Parsing ambiguous natural language. Cross-checking across multiple model passes. Catching gaps where the intent is underspecified. Returning a formal result that an SMT solver can verify. These are hard engineering problems, and we've already solved them for guardrails. The architecture generalizes.

This is the cold start problem that academic vericoding teams keep running into. There's almost no paired NL-to-Dafny data in the wild. The ATLAS team had to synthetically generate their training data. Meanwhile, PreFlight has been operating in production, processing real natural language policies across financial, access control, and compliance domains. The patterns our system has learned for translating intent into formal logic apply directly to translating intent into code specifications. We don't need to start from zero. We're extending a system that already works.

The Vericoding Pipeline

1. Natural Language Intent The developer writes what they want in English: "No user can withdraw more than their balance. Failed withdrawals don't change state. Daily withdrawal limits are enforced."

Same interface as writing a PreFlight policy. Same experience. If you've used PreFlight, you already know how to write a vericoding spec.

2. LLM → Dafny Specification (Multi-Model Cross-Check) Multiple LLMs independently translate the intent into Dafny pre/postconditions. Only translations where a majority agree are accepted. PreFlight already uses this cross-checking pattern for policy compilation. Applying it to code specs is a direct extension of the same architecture.

3. SMT Spec Scoring Before any code is generated, the spec itself is analyzed by Z3. The spec is just logic. You don't need an implementation to audit it. The solver checks: Is the spec consistent? What inputs have undefined behavior? How many distinct implementations satisfy it? The output is a confidence score and a gap analysis the human can actually read:

Spec Confidence: 72%

✓ Balance non-negativity: fully specified
✓ Success case: fully specified
⚠ Failure case: underspecified
⚠ Daily limit: not mentioned
✗ Concurrent access: not addressed

PreFlight already catches these kinds of ambiguities in guardrail policies. A travel expense policy that technically allows $1,001 purchases without approval? The solver finds it. A withdrawal spec that doesn't constrain failure behavior? Same solver, same technique.

4. Human Battle Testing The human reviews ambiguous parts of the spec, not 500 lines of generated code. "Does ensures balance == old(balance) - amount capture what I mean by withdrawal?" That is a question anyone with domain knowledge can answer.

5. LLM → Verified Implementation DafnyPro-style annotation and code generation. The LLM writes the implementation, Z3 verifies it satisfies the spec. If it fails, iterate. This loop already works at 86% first-pass success in academic benchmarks.

6. Compile to Production Language Dafny compiles to Python, Java, C#, Go, JavaScript. Verified code, in the language your stack actually uses.

7. Archive with Cryptographic Proof The SMT-LIB2 file is a portable, standard-format proof artifact. Any third party can feed it to any compliant solver (Z3, CVC5, Yices) and independently confirm the verification result. And because this is ICME, the proof doesn't stop at the solver output. We wrap it in a cryptographic proof (ZKP), the same way PreFlight wraps every guardrail decision. Verifiable in under a second. On any device. You can also fold these proofs together to verify the entire program end to end!

Guardrails & Verified Code

PreFlight proves agent actions are safe. Vericoding proves agent-written code is correct. These are not completely separate. They are the same trust infrastructure applied at different layers of the stack.

Think about what the fully integrated picture looks like. An AI agent writes code (vibe coding). The code is verified against a formal spec derived from human intent (vericoding). Every verification produces a cryptographic receipt (PreFlight). The agent then operates under guardrail policies that are themselves formally verified (PreFlight again). From code generation to runtime behavior, every step is provably correct and independently auditable. One trust layer, top to bottom.

For regulated industries & agentic commerce, this is a major unlock. Healthcare companies aren't vibe coding because they can't prove correctness to auditors. Financial services aren't vibe coding because compliance requires evidence, not vibes. 72% of healthcare and 66% of financial services companies are sitting out the AI coding revolution entirely.

We're building the tool that lets them in:

Translate compliance rules from English to formal specs
Score specs for completeness before code is generated
Prove implementations correct against the spec
Archive standard-format proof artifacts for regulators
Wrap every proof in a cryptographic receipt, verifiable by any third party in under a second

That's not a developer tool. That's compliance infrastructure for the age of AI-generated code. And the TAM is every regulated line of code that will ever be written by an AI.

The Math Works. We're Shipping It.

The vericoding benchmark shows 82% success with off-the-shelf models. DafnyPro shows 86% with inference-time techniques. Dafny verification hit 96% in the latest results. The SMT-LIB standard has been stable for decades. Z3 powers a billion checks per day at AWS.

The academic pieces are proven. The solver infrastructure is battle-tested. What's been missing is a product that starts from natural language, ends with verified code, and produces a cryptographic proof of every step. We've already built the NL-to-formal-logic engine for guardrails. We've already shipped the cryptographic proof layer. The architecture generalizes directly to code verification.

Vericoding is an additional step. And we're already most of the way there!

The vibe coding trust crisis is here. 46% of code is AI-generated. Trust is falling. Breaches are climbing. Regulated industries are locked out. The bottleneck is verification, not generation.

LLM automated coding. Now we're using cryptography to automate verification.

With math, not vibes.

I build software and write about where AI meets cryptography.

My Twitter

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hacker News - Newest: "AI"

Vibe Coding vs. Vericoding

The Gap Everyone Sees, Nobody Has Closed

We Already Do This. It's Called PreFlight.

Why PreFlight Is a Vericoding Engine in Disguise

The Vericoding Pipeline

Guardrails & Verified Code

The Math Works. We're Shipping It.