惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

PCI Perspectives
PCI Perspectives
Apple Machine Learning Research
Apple Machine Learning Research
Recent Announcements
Recent Announcements
量子位
H
Hackread – Cybersecurity News, Data Breaches, AI and More
腾讯CDC
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
S
Schneier on Security
Microsoft Azure Blog
Microsoft Azure Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
小众软件
小众软件
Recorded Future
Recorded Future
P
Privacy International News Feed
Cisco Talos Blog
Cisco Talos Blog
Latest news
Latest news
C
Check Point Blog
O
OpenAI News
N
Netflix TechBlog - Medium
U
Unit 42
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
P
Proofpoint News Feed
Hacker News - Newest:
Hacker News - Newest: "LLM"
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
宝玉的分享
宝玉的分享
F
Full Disclosure
Know Your Adversary
Know Your Adversary
GbyAI
GbyAI
W
WeLiveSecurity
Engineering at Meta
Engineering at Meta
Scott Helme
Scott Helme
云风的 BLOG
云风的 BLOG
I
InfoQ
D
Docker
N
News | PayPal Newsroom
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
T
Tor Project blog
The GitHub Blog
The GitHub Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
ThreatConnect
人人都是产品经理
人人都是产品经理
S
Securelist
G
Google Developers Blog
Martin Fowler
Martin Fowler
雷峰网
雷峰网
Stack Overflow Blog
Stack Overflow Blog
P
Privacy & Cybersecurity Law Blog
L
Lohrmann on Cybersecurity
博客园 - 【当耐特】
博客园 - 司徒正美
Hugging Face - Blog
Hugging Face - Blog

DEV Community

I built a local voice AI that can change to 9 different personalities! UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would Wyrly DI: Type-safe Dependency Injection for Modern TypeScript Gemma 4's Hidden Superpower: Why Built-in Thinking Tokens Change Everything for Evaluation Tasks ⚡ WordPress Performance: The Real Truth They Don't Tell You A Mobile App Usually Needs an Admin System First Customer Portals Should Remove Repeated Admin Work Episode 4: The Time Loop (Layers & Caching) I Built ContextForge with Gemma 4: A Project Memory Generator for Developers and AI Coding Agents Why shadow DOM beat iframe for inline tooltips HOW TO CREATE USER AND ASSIGN ROLES IN AZURE WITH ENTRA ID When AI Blackmail Goes Viral Episode 3: The Secret Scroll (The Dockerfile) Monte Carlo Simulation for Engineers: Turning Uncertainty Into Numbers The tokens-per-byte trap: character-level 'compression' adds tokens Nobody Reads Your Code Anymore Why I built a collection of 5 free, zero-signup career finance tools for solo builders 🚀 New React Challenge: Instant UI with useOptimistic Resolvendo a Alucinação da IA na Arquitetura de Software com Code Property Graphs e .NET 9 S1 — Clean Backtrace Crashes: How to Diagnose and Fix Them Cómo solucionar el bucle infinito en useEffect con objetos y arrays The Brutal Reality of Running Gemma 4 Locally I made Claude Code refuse to write code unless the ticket scores 80/100 I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed. Building a Private RAG System: Lessons from a Local-First AI Journal CodePulse AI — Reviving an AI-Powered Repository Intelligence Platform How to Split Video into Segments with FFmpeg (CLI + API) I've audited dozens of estate agency websites. The same 5 problems show up every single time. Part 1: Taming Asynchronous JavaScript: How to Build a "Mailbox" Queue Building My AI-Powered VS Code Extension 🚀 Google Login in Express with PassportJS & JWT Great example of Gemma 4 moving beyond chatbots into real-world decision support. Using AI to guide everyday actions like recycling shows how impactful applied LLMs can be when designed for usability, not just capability. #Gemma4 #AI #Sustainability Building a Production AI Chatbot for an Educational Institute: Architecture, Lessons & Full Stack Deep-Dive Google Login in Express with PassportJS & JWT How I reclaimed 47GB on my MacBook by cleaning developer project junk Operators Are Not Oracles: How We Learned to Stop Worrying and Love the Configuration I Built 6 Free Developer Tools for AI APIs, Cron, Docker, and Self-Hosting How I Built a Real-Time Precious Metals Price Feed for 30,000 Concurrent Users in Laravel How to Use a SERP API to Validate Whether a Project Idea Is Worth Building Gemma 4 discussions often focus on capability, but real-world impact depends on deployment context. For offline education, especially in low-connectivity regions, latency, cost, and local inference matter as much as model strength. Local Mind Explores it Space Complexity + Ω and Θ Notations Google I/O 2026 Just Confirmed the Shift From AI Chatbots to AI Agents How to Add API Monitoring to an Express App in 5 Minutes (2026) Designing an In-Game Inflation Tracking Algorithm for Web Utility Apps Google AI Studio Just Changed the Shape of App Development If you struggle to learn then this is for you. Best AI Agent Security & Guardrails Tools in 2026: LLM Guard vs NeMo vs Guardrails AI Building Dynamic RBAC in React 19: From Permission Strings to Component-Level Access Control How to Build a Self-Hosted AI Code Review Tool in Python Why We Switched from React to HTMX in Production: A 200-Site Case Study Gemma-Loom: The Intent-Based Virtual Machine (IVM) for Edge Sovereignty Java实习海投攻略:3天300个沟通,我是怎么拿到面试的 I Deployed Netflix's Web Server in 30 Seconds (And So Can You) - Docker Project 1 Debugging Android 14 WebRTC Disconnects on a coturn Relay Path 1/30 Days System Design Question Testing FastAPI + SQLAlchemy with Real PostgreSQL Fixtures: No More Mocking Misery FAQ Schema Markup Generators: What They Actually Do (and What They Don't Tell You) How a pure-TypeScript flex layout engine closed the last WASM-Yoga gap Spot instances as GitHub Actions runners Agents Need Receipts, Not Just Better Prompts readmegen — Generate beautiful README.md in seconds (12 templates, open source) When AI Reads Blueprints: The Hidden Attack Surface of Multimodal Engineering Intelligence Simplicity scales — complexity kills side projects AI does exactly what you ask — that's the problem How a model upgrade silently broke our extraction prompt (and how we caught it) The Best Form Backend for Static Sites in 2026 # ⛽ I Built a Cross-Platform Fuel Finder with React & Supabase: The Indie Dev Journey The 11 Major Cloud Service Providers in 2025 Membangun Karya Visual: Mengintip Fasilitas Multimedia dan Studio Kreatif Amikom What Is IOPS? Visualizing Database Design: From Interactive Canvas to Drizzle, Prisma, and SQL in Real-time A tool to make your GitHub README impossible to ignore 🚀 Zero-Downtime Blue-Green and IP-Based Canary Deployments on ECS Fargate I reproduced a Claude Code RCE. The bug pattern is everywhere. We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found. Jenkins CI/CD Pipeline for a Dockerized Node.js Application: Manual Trigger vs Automatic Trigger Using GitHub Webhooks How to Stream Live Forex Rates to Google Sheets API: A Complete Guide Small Models Will Beat Giant Models (And Most People Haven’t Realized Why Yet) How I Built 5 Linux Automation Scripts on AWS EC2 I built TokenPatch to measure AI coding cost per applied patch I built a Chrome extension to stop squinting at the web Producer audit clean, six tests red Conversa — A Multi-Agent AI Platform Powered by Gemma 4 Build a Real Agent in 15 Minutes with Gemini's New Managed Agents API What I Actually Build: AI Systems That Ship, Not Demos That Impress The Box Ticked While You Read This: LinkedIn, AI Training, and the Switch You Did Not Flip Investasi Masa Depan: Mengintip Fasilitas Laboratorium Komputer Kelas Dunia di Yogyakarta I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead How To Build an Image Cropper in Browser (Simple Steps) I built a macOS disk cleaner for developers and just launched it would love feedback Membangun Kompetensi dan Relasi: Mengapa Ekosistem Kampus Itu Penting I Built an AI That Decides Which AI to Talk To — Running 24/7 From My Living Room Codex Team Usage SOP How to Actually Become a Programmer: The Hard Part Nobody Wants to Explain Building a Production-Style Multi-Tool AI Agent with Python, Flask, React & Gemini AI The Caretaker Sandbox: An Offline-First Visual Playground & Template Engine powered by Gemma 4 # Building Instagram OSINT Projects with HikerAPI Your AI can read. Gemma 4 can see The Battle of the Senior Dev: Why AI Gives You Wings But Only If You're Ready to Pilot
The contract is the interface: agent-driven Steampipe Stave in one command
Bala Paranj · 2026-05-23 · via DEV Community

Consider a typical cloud-security tool's onboarding flow. A customer installs the tool. The tool's collector tries to authenticate to AWS, fails because the role isn't there yet, the customer follows three pages of setup docs, the role gets created, the collector authenticates, the collector runs, the collector finds nothing because the tool only knows about S3 and IAM and the customer's workload is on EKS. End of week one.

We don't ship a collector. Stave evaluates obs.v0.1 JSON snapshots — whatever produces them. That decision sounds extreme until you've watched the same "the collector doesn't see our environment" conversation play out three times. So instead of a collector, Stave ships a contract: per-asset JSON Schemas, per-asset Steampipe→Stave column mappings, and one command (stave contract show) that emits everything an agent needs to author its own ingest. The customer's preferred source (Steampipe, AWS Config, Terraform state, an internal inventory API) plugs in by satisfying the contract.

This post walks through the steps that closes the pipeline.

What the customer sees

$ stave contract show --asset-type aws_s3_bucket
Contract: aws_s3_bucket
Schema:   schemas/observation/v1/asset-types/aws_s3_bucket.schema.json
Controls: 102 | Chains: 15

Property paths (catalog reads these — sorted by chain unlock, then control unlock):

  PATH                                                          CONTROLS  CHAINS  SEVERITY  NOTE
  ────                                                          ────────  ──────  ────────  ────
  storage.kind                                                  91        15      critical
  storage.tags.data-classification                              14        2       critical  intent
  storage.access.public_read                                    8         2       critical
  storage.controls.public_access_fully_blocked                  3         1       critical
  ...

Steampipe mapping: contracts/steampipe/aws_s3_bucket.yaml

Enter fullscreen mode Exit fullscreen mode

That output names everything the customer's ingest agent needs:

  • The schema — the JSON Schema the agent's output must satisfy
  • The property paths — what fields the catalog actually reads on this asset type, ranked by how many controls and chains they unlock
  • The mapping — a ready-to-run YAML telling the agent which Steampipe column maps to which Stave property path

For the 17 most catalog-impactful asset types, the mapping is committed. For the rest, the customer's agent has the schema; it can author its own.

The YAML mapping format

The Steampipe→Stave mapping is one ordered list of operations per asset type. Four operation kinds cover every transform shape:

  • field — direct column → property mapping with optional coerce/default
  • static — a fixed value (e.g. properties.storage.kind: bucket)
  • extract — pull a nested JSON value from a JSON-shaped column
  • computed — derive from already-set property paths (all / any reduction)

Operations run in YAML order; later ops can read paths written by earlier ones. The first mapping we wrote — contracts/steampipe/aws_s3_bucket.yaml — replaced a Python function with a declarative file. The loader changes are 100 lines; the resulting observation is byte-identical to what the imperative function produced.

operations:
  - kind: static
    path: properties.storage.kind
    value: bucket

  - kind: field
    path: properties.storage.tags
    column: tags
    default: {}
    type: dict

  - kind: extract
    path: properties.storage.encryption.algorithm
    column: server_side_encryption_configuration
    json_path: "Rules.0.ApplyServerSideEncryptionByDefault.SSEAlgorithm"
    key_variants:
      Rules: rules
      SSEAlgorithm: sse_algorithm
    default: "none"

  - kind: computed
    path: properties.storage.controls.public_access_fully_blocked
    op: all
    inputs:
      - properties.storage.controls.public_access_block.block_public_acls
      - properties.storage.controls.public_access_block.block_public_policy
      - properties.storage.controls.public_access_block.ignore_public_acls
      - properties.storage.controls.public_access_block.restrict_public_buckets

Enter fullscreen mode Exit fullscreen mode

The format is the contract. Any agent in any language can parse the YAML and produce conforming observations.

Per-asset JSON Schemas

The catalog ships 3,957 controls; together they declare applicable_asset_types for 109 distinct asset types. To validate that a mapping's target paths are real, we needed a JSON Schema per asset type. Hand-authoring 109 schemas is a Tuesday lost; the schema generator already existed (it walks every control's predicate AST and infers the property paths + types), but defaulted to the top-3 most-used types.

go run ./internal/tools/genassetschemas/... -top 200
make sync-schemas

Enter fullscreen mode Exit fullscreen mode

Output: 109 per-asset schemas under schemas/observation/v1/asset-types/. Every level is additionalProperties: true — the schemas are discoverability artifacts, not restrictive gates. A schema that lists one property (security_hub.enabled on aws_securityhub_account, for example) tells an agent "this asset type matters to the catalog; here is the one property to populate." Thin schemas are still useful.

Ten hand-authored mappings

The next 10 asset types by control coverage — aws_iam_role, aws_lambda_function, aws_cognito_user_pool, aws_cloudtrail_trail, aws_kms_key, aws_ec2_instance, aws_sqs_queue, aws_iam_user, aws_opensearch_domain, aws_stepfunctions_state_machine — got hand-authored mappings. They served two purposes: actual coverage for the most-asked-for types, and a ground-truth corpus to validate Iter 5's auto-generator against.

Every mapping carries a derived_properties: block listing the catalog-read properties that cannot come from a single Steampipe column. Example from aws_iam_role.yaml:

derived_properties:
  - path: properties.identity.role.cross_account_trust_without_external_id
    source: "Parse trust_policy  detect external Account in Principal without sts:ExternalId condition"
  - path: properties.identity.permission_categories.has_incompatible_categories
    source: Policy analysis against controldata/taxonomy/permission_categories.yaml
  - path: properties.identity.access_advisor.available
    source: iam:GenerateServiceLastAccessedDetails + iam:GetServiceLastAccessedDetails (separate API call per role)

Enter fullscreen mode Exit fullscreen mode

That block is the agent's TODO list. Silently producing an observation without those derived properties is the failure mode the derived_properties: section prevents — Stave's controls don't see the property, the catalog finds nothing wrong, the breach happens anyway.

The Contract Show Command

The three sources — schema, predicate index, mapping file — already existed. Joining them required three separate file reads. The new command joins them once:

stave contract show --asset-type aws_iam_role --format json

Enter fullscreen mode Exit fullscreen mode

{
  "asset_type": "aws_iam_role",
  "has_schema": true,
  "schema_path": "schemas/observation/v1/asset-types/aws_iam_role.schema.json",
  "controls_count": 198,
  "chains_count": 38,
  "property_paths": [
    {
      "path": "properties.identity.kind",
      "controls_count": 196,
      "chains_count": 35,
      "max_severity": "critical",
      "is_intent_property": false
    },
    ...
  ],
  "steampipe_mapping": "contracts/steampipe/aws_iam_role.yaml"
}

Enter fullscreen mode Exit fullscreen mode

Or:

stave contract show --list

Enter fullscreen mode Exit fullscreen mode

Asset types with controls: 109 (schema: 109, steampipe mapping: 17)

  TYPE                              SCHEMA  CONTROLS  CHAINS  MAPPING
  ────                              ──────  ────────  ──────  ───────
  aws_iam_role                      yes     198       38      steampipe
  aws_s3_bucket                     yes     102       15      steampipe
  aws_lambda_function               yes     169       12      steampipe
  aws_bedrock_agent                 yes     24        5       -
  ...

Enter fullscreen mode Exit fullscreen mode

The implementation reuses everything already in the codebase: compose.LoadControlsFrom, compose.LoadChainDefinitions, predindex.Build (the same index the stave gaps command uses), and a 50-line helper in internal/contracts/schema/load.go to access the embedded per-asset schemas. The command is ~330 lines; nothing is new data — it's projection over existing data.

Auto-generator

The remaining ~98 asset types could be hand-authored or auto-generated. We tried auto. The generator joins the cached Steampipe column catalog with each per-asset schema's property paths, applies a four-rule matching priority (per-asset overrides, schema-path lookup with multi-token scoring, tags convention, fallback to properties.<ns>.<col>), and emits a YAML in the same operations-list format Iter 1 established.

make gen-steampipe-mappings           # generate, skip existing
make gen-steampipe-mappings-validate  # measure accuracy

Enter fullscreen mode Exit fullscreen mode

Validation runs the generator against the 11 hand-authored YAMLs (Iter 1 + Iter 3) and compares the auto-generated (column, path) tuples against the ground truth:

Overall: 149/177 = 84% accuracy across 17 type(s)

Enter fullscreen mode Exit fullscreen mode

84% — past the 80% target. The remaining 16% are the multi-target JSON-path extracts the brief flagged as inherently manual (one column → two property paths is not something a name-similarity heuristic can synthesise). Auto-generated YAMLs carry _auto_generated: true + _review_required: N + _unmatched_paths: [...] so the reviewer's surface is bounded.

The detailed story of the heuristic — and how it went from 8% accuracy on the first pass to 84% on the fourth — is its own post. The point here is what's committed: 17 total mappings (11 hand-authored, 6 auto-generated), every one of them an artifact a customer's agent can read in any language.

Who owns contract sits where it does

The architecture choice that makes this work: extractors are client-owned. Stave does not ship a collector. The contracts/steampipe/ directory contains instructions, not code. An agent reads the schema and the mapping; the agent produces the observation; Stave evaluates the observation. The collector boundary is a file, not a process.

This decision has been in our architecture docs since the project started, but until now there was no single command that surfaced the contract to an agent. An agent that wanted to author a Steampipe ingest for a new asset type had to:

  1. Find the per-asset schema (one of several embedded directories)
  2. Decide what property paths to populate (no canonical list — derive from controls)
  3. Map Steampipe columns to those paths (no template — invent it)

The agent runs one command and gets all three. The agent runs make gen-steampipe-mappings and gets a starting-point YAML it can refine. The integration is a lot easier.

What stayed out of Stave

Nothing in the Stave Go binary changed across the five iterations except the new cmd/contract/ directory (one file, ~330 LOC). The agent infrastructure is:

  • examples/agents/stave_transform.py — reference loader (Python)
  • contracts/steampipe/*.yaml — 17 mappings (committed)
  • scripts/gen-steampipe-mappings.py — auto-generator (Python, ~280 LOC)
  • scripts/steampipe-columns.json — cached column catalog (refreshable from a live Steampipe install)

The deterministic policy engine is unchanged. The contract evolves; the engine doesn't.

The Generic Pipeline Shape

Replace Steampipe with any external data source — AWS Config, Terraform state, your internal inventory, Salesforce, OpenAPI specs — and the pipeline shape is the same:

  1. Define the canonical target contract. For Stave it's obs.v0.1 JSON with per-asset-type sub-schemas. For your tool, it's whatever shape your engine reads.

  2. Author one mapping per source per asset type. YAML is fine. Operations list with field/static/extract/computed semantics covers most transform shapes.

  3. Ship a discovery command. One CLI that joins the schema + the path list + the mapping into a single agent-readable output. The agent stops needing your team's docs.

  4. Auto-generate the boring half. Most column→path mappings are name-similarity. The exceptions are rare enough to hand-author. Use the hand-authored set as a ground-truth corpus to measure your generator's accuracy.

  5. Mark uncertainty explicitly. _review_required, _unmatched_paths, derived_properties:. Silent gaps are worse than loud ones.

Five points, one functioning pipeline. The customer who needed three pages of collector setup now needs make gen-steampipe-mappings and an agent that can read a YAML.