惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Martin Fowler
Martin Fowler
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
T
Threat Research - Cisco Blogs
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Troy Hunt's Blog
V
V2EX - 技术
Hacker News - Newest:
Hacker News - Newest: "LLM"
H
Heimdal Security Blog
T
Tor Project blog
IT之家
IT之家
Project Zero
Project Zero
GbyAI
GbyAI
Security Latest
Security Latest
S
Security Archives - TechRepublic
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
Spread Privacy
Spread Privacy
S
Security Affairs
A
Arctic Wolf
C
Cybersecurity and Infrastructure Security Agency CISA
I
Intezer
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
Google DeepMind News
Google DeepMind News
T
Threatpost
I
InfoQ
F
Full Disclosure
Blog — PlanetScale
Blog — PlanetScale
Last Week in AI
Last Week in AI
Cisco Talos Blog
Cisco Talos Blog
N
Netflix TechBlog - Medium
MyScale Blog
MyScale Blog
H
Help Net Security
S
Securelist
Y
Y Combinator Blog
月光博客
月光博客
博客园_首页
Engineering at Meta
Engineering at Meta
酷 壳 – CoolShell
酷 壳 – CoolShell
J
Java Code Geeks
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
A
About on SuperTechFans
K
Kaspersky official blog
Microsoft Azure Blog
Microsoft Azure Blog
Vercel News
Vercel News
阮一峰的网络日志
阮一峰的网络日志
T
The Exploit Database - CXSecurity.com
B
Blog

Hacker News: Show HN

PurrrrrFocus: Pomodoro Timer App - App Store Workflow Engine — Multi-Step Orchestration for Bun RapidPhoto: Pro Photo Editor App - App Store GitHub - DheerG/swarms: Achieve extraordinary results with claude code across a variety of tasks SPICE simulation → oscilloscope → verification with Claude Code — Lucas Gerads Show HN: VCoding – A 5 MB native Windows IDE with no dynamic dependencies Show HN: LLMs don't hallucinate because they're bad at math, it's the format GitHub - Agent-FM/agentfm-core: AgentFM is a peer-to-peer network that turns everyday computers into a decentralized AI supercomputer. AgentFM lets you run massive AI workloads directly across a global mesh of idle CPUs and GPUs. Show HN: Tracking Top US Science Olympiad Alumni over Last 25 Years GitHub - Potarix/agent-hub: One place to talk to all your agents Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) GitHub - dubeyKartikay/lazyspotify: Terminal Spotify client for macOS and Linux GitHub - the-banana-tool/king-louie: Easy to use GUI Personal AI Assistant. Win/Linux/Mac. Show HN I made my vacation rental bookable by AI agents–no Airbnb, 0% commission GitHub - basteez/jsf-autoreload: maven plugin to enable hot reload on jsf projects uvm32/hosts/host-gdbstub at main · ringtailsoftware/uvm32 GitHub - labsai/EDDI: Config-driven engine that turns JSON into production-grade AI agents. Multi-agent orchestration, 12+ LLM providers, MCP/A2A protocols, RAG, persistent memory, and enterprise compliance (EU AI Act, GDPR, HIPAA). Built on Quarkus. GitHub - glitchnsec/fortyone-oss: AI Executive Assistant Platform Quickstart | Alien GitHub - muxshed/shed: One stream in, or many. Every destination, simultaneously. No cloud middleman, no per-channel fees, no limits. GitHub - ocrbase-hq/ocrbase: 📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable. GitHub - impactjo/home-memory: MCP server that lets your AI assistant remember everything about your home. GitHub - Sets88/dbcls: DbCls is a powerful terminal database client that supports various databases GitHub - neptun2000/heor-agent-mcp GitHub - SeanFDZ/macmind: Single-layer transformer in HyperTalk for the classic Macintosh RollQuation: Math Puzzles - Apps on Google Play GitHub - dropbox/witchcraft Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis GitHub - opentalon/opentalon: OpenTalon is an open-source platform built from the ground up in Go as a robust alternative to OpenClaw LinkedIn™ 职位抓取工具 - Chrome 应用商店 GitHub - EdoardoBambini/Agent-Armor-Iaga: AI agents are getting tool access — shell, file system, databases, APIs, secrets. But **nobody is governing what they actually do with it**. Frameworks like LangChain, CrewAI, AutoGen, and Claude Code give agents the power to execute. Agent Armor gives you the power to control, audit, and approve every single action before it happens. HN Vibes — Week 15, Apr 7–13 2026 GitHub - chojs23/ec: Easy terminal-native 3-way git mergetool vim-like workflow GitHub - SethPyle376/hiraeth: Local AWS emulator focused on fast integration testing, with SQS support, SQLite-backed state, and a debug-friendly web UI. GitHub - JakOb-dotcom/cloud-sandbox-security-analysis: Technical analysis and Proof of Concept (PoC) regarding environment variable exfiltration in containerized cloud sandboxes via side-channel data leaks. Show HN: Flint – A 30B model fine-tuned for less repetition Show HN: A simpler coding agent harness GitHub - audiodude/sudomake-friends GitHub - 256thFission/mini-mythos: OSS clone of Anthropic’s Mythos harness to locate C/C++ memory vulnerabilities Show HN: OpenParallax: OS-level privilege separation for AI agent execution Hacker News Sorted - Chrome 应用商店 Show HN: How to Install Docker on Ubuntu 24.04 LTS: Complete 2026 Guide GitHub - himanshudongre/smriti GitHub - sverrirsig/claude-control: macOS desktop dashboard for monitoring and managing multiple Claude Code sessions GitHub - ory/dockertest: Write better integration tests! Dockertest helps you boot up ephermal docker images for your Go tests with minimal work. Chiral - Chrome 应用商店 Show HN: Two Claudes collaborating through shared memory on a $100 mini-PC GitHub - pmichaillat/latex-cv: Minimalist LaTeX template for academic CVs GitHub - oguzbilgic/posse: A web UI for Anthropic Managed Agents. GitHub - sshiraz/depsly: Dependency risk analysis tool for npm packages ABI Add safari/agent-harness — Safari browser automation via safari-mcp by achiya-automation · Pull Request #212 · HKUDS/CLI-Anything GitHub - Halfblood-Prince/trustcheck: Verify PyPI package attestations and improve Python supply-chain security GitHub - oguzbilgic/kern-ai: Agents that do the work and show it. GitHub - bruits/satteri: High-performance Markdown and MDX processing for the JavaScript ecosystem GitHub - tylergibbs1/feedstock: High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright GitHub - Grimm67123/grimmbot: The self-improving sandboxed and open-source AI agent. With persistent memory and scheduling. GitHub - whitevanillaskies/whitebloom: Local whiteboard that blooms. GitHub - hwdsl2/docker-whisper: Docker image for a self-hosted Whisper speech-to-text server with speaker diarization and OpenAI-compatible transcription and translation APIs. Powered by faster-whisper. Supports all Whisper models, NVIDIA GPU (CUDA) acceleration, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64). GitHub - yisding/reviewwiggum GitHub - MarwanAlsoltany/serrors: Structured errors for Go: sentinel hierarchies, typed data, custom formatting, and slog integration. GitHub - soatok/age-php GitHub - Luthiraa/markitme GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits GitHub - tombedor/excalicharts GitHub - wh1le/excalidraw-edit: Open and edit .excalidraw files from the terminal. Offline, auto-saves to disk. MalExt Sentry - Malicious Extension Scanner - Chrome 应用商店 GitHub - syi0808/asciianimesvg: Generate animated ASCII art SVGs from text. CLI, Rust library, WASM, and web editor. GitHub - zaina-ml/ml_forge: A visual-based graph node editor for training computer vision models. GitHub - anakin87/llm-rl-environments-lil-course: 🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models GitHub - takaakit/superpowers-uml: Superpowers-UML modifies Superpowers to ensure a software development workflow in which AI agents design through UML modeling. AdriByte Studio - Sviluppo Web e Soluzioni Digitali GitHub - chouligi/angel-copilot: Your personalized Angel Investment Advisor Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 GitHub - agenteractai/lodmem: Level Of Detail Context Management for Agents GitHub - ostefani/subnetlens: A fast, concurrent network scanner with a TUI and plain-text CLI, built in Go. It discovers live hosts on your network, scans their open ports, resolves hostnames, and fingerprints operating systems—delivered. Cyber Pulse: Agentic Intel - Apps on Google Play Whisper API: Self-Hostable Speech to Text Transcription The Agent-Web Protocol Stack: A Research Thesis GitHub - msmarkgu/RelayFreeLLM: A restful API designed to route user prompts to various AI model providers. Show HN: Provepy – A Python decorator that proves your code using Lean and LLMs Show HN: Pardonned.com – A searchable database of US Pardons GitHub - patrickdappollonio/dux: Dux is a terminal UI that lets you run multiple AI coding agents side by side, each in its own git worktree, with full companion terminals, macros, commit generation, and a command palette that knows more tricks than you do. kMC Crystal Simulator Show HN: HyperFlow – A self-improving agent framework built on LangGraph GitHub - stef41/vibescore: 🎵 Grade your vibe-coded project. One command, instant letter grade across security, quality, dependencies, and testing. GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. imgur.com GitHub - visionscaper/collabmem: Enabling long-term collaboration with Agentic AI - building up episodic and world model memory over time with in-context awareness 在 Steam 上购买 FriedrichAI: Offline AI 立省 10% GitHub - atripati/ark: AI Runtime Kernel — a context operating system for AI agents. Eliminates tool bloat, loads only what’s needed, and gives LLMs their reasoning space back. GitHub - nowork-studio/toprank: Open-source Claude Code skills for SEO, SEM, Google Ads GitHub - tacomanator/sash: Lightweight macOS menu bar app for reliably cycling through windows of the current application. Appents | Social Media Management for Product-First Teams GitHub - pnhoang/youtube-spam-blocker: Automatically detects and hides spam messages in YouTube Live chat. Set rate limits, keyword filters, and block repeat offenders. GitHub - decisionnode/DecisionNode: CLI + Local MCP - A shared structured memory store across Claude Code, Cursor, Windsurf, Antigravity, and every MCP client. Semantically queryable. GitHub - AvaCodeSolutions/django-email-learning: An open source Django app for creating email-based learning platforms with IMAP integration and React frontend components. The $100K Gap in Kubernetes Security Tooling Function Calling Harness: From 6.75% to 100%
GitHub - foxtrotcommunications/foxtrotcommunications-forge-core: Automatically decompose nested JSON in your data warehouse into normalized dbt models. Supports BigQuery, Snowflake, Databricks, and Redshift.
brady_bastia · 2026-04-23 · via Hacker News: Show HN

Automatically decompose nested JSON in your data warehouse into normalized dbt models.

Forge Core is a deterministic BFS engine that reads a single JSON column (or multi-column table), discovers all nested structures, and generates:

  • dbt SQL models — one per nested object/array
  • Rollup view — reassembles the full document from normalized tables
  • schema.yml — structural column inventory
  • JSON Schema — standard draft-07 schema of the discovered structure
  • Mermaid ER diagram — table relationship visualization
  • dbt docs — browseable documentation site

Supported Warehouses

Warehouse Install Extra Status
BigQuery foxtrotcommunications-forge-core[bigquery] ✅ Production
Snowflake foxtrotcommunications-forge-core[snowflake] ✅ Production
Databricks foxtrotcommunications-forge-core[databricks] ✅ Production
Redshift foxtrotcommunications-forge-core[redshift] 🚧 Beta

Quickstart

pip install foxtrotcommunications-forge-core[bigquery]

forge-core build \
  --source-type bigquery \
  --source-project my-gcp-project \
  --source-database my_dataset \
  --source-table my_json_table \
  --target-dataset my_target

Or use the Python API:

from forge_core import build_core

result = build_core(
    source_type="bigquery",
    source_project="my-gcp-project",
    source_database="my_dataset",
    source_table_name="my_json_table",
    target_dataset="my_target",
)

print(f"Created {result.total_models_created} models")
print(f"Processed {result.total_rows_processed} rows")

Enabling progress output

Forge Core uses Python's standard logging module. By default nothing is printed — add this before your build_core() call to stream progress to the console:

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s  %(message)s",
    datefmt="%H:%M:%S",
)
logging.getLogger("forge_core").setLevel(logging.INFO)

This works in Jupyter notebooks, plain scripts, Airflow (routes through its own handler automatically), and any CI/CD environment that captures stdout.

How It Works

┌─────────────────────────────┐
│  Source Table (JSON column)  │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  1. Root Model (frg)        │  Parse JSON → root SELECT
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  2. BFS Discovery Loop      │  For each level:
│     - Discover keys          │    • get_keys() → field names
│     - Infer types            │    • get_types() → STRUCT/ARRAY/scalar
│     - Generate SQL model     │    • create_file_in_models()
│     - dbt build              │    • run_dbt_command()
│     - Tag as excluded        │    • tag_models_as_excluded()
│     - Queue children         │    • next_batch.extend()
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  3. Rollup View              │  JOIN all tables back into
│     (frg__rollup)            │  nested STRUCT/ARRAY form
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  4. Artifacts                │  schema.yml, JSON Schema,
│                              │  Mermaid diagram, dbt docs
└─────────────────────────────┘

Authentication

Forge Core uses standard warehouse authentication:

  • BigQuery: Application Default Credentials (gcloud auth application-default login) or GOOGLE_APPLICATION_CREDENTIALS
  • Snowflake: SNOWFLAKE_ACCOUNT, SNOWFLAKE_USER, SNOWFLAKE_PRIVATE_KEY_PATH, etc.
  • Databricks: DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_ACCESS_TOKEN
  • Redshift: REDSHIFT_HOST, REDSHIFT_USER, REDSHIFT_PASSWORD, REDSHIFT_DATABASE

Project Structure

After a build, your project directory looks like:

forge_project/
├── dbt_project.yml
├── profiles.yml          # Auto-generated
├── macros/
│   └── incremental_tmp_table_dropper.sql
├── models/
│   ├── frg.sql           # Root model
│   ├── frg__root__....sql # Unnested models (one per level)
│   ├── frg__rollup.sql   # Rollup view
│   └── schema.yml        # Column inventory
└── target/
    ├── schema.json        # JSON Schema
    ├── schema.mmd         # Mermaid diagram
    └── index.html         # dbt docs

Use in Airflow / Containers

# Airflow PythonOperator
from forge_core import build_core

def forge_task(**context):
    result = build_core(
        source_type="bigquery",
        source_project="my-project",
        source_database="raw",
        source_table_name="api_responses",
        target_dataset="normalized",
        project_dir="/tmp/forge_project",
    )
    return result.total_models_created

Understanding the Generated Schema

Key Columns

Every table generated by Forge Core contains these system columns:

Column Type Description
ingestion_hash STRING Hash of the source row. Groups all decomposed tables that came from the same original JSON document.
idx STRING Composite positional key. Encodes the exact path through nested arrays to reach this row.
ingestion_timestamp TIMESTAMP When the row was ingested.
table_path STRING Hierarchical path describing the nesting lineage (e.g., root__experiments__team).

How idx Works

The idx column is a _-delimited string that grows one segment per nesting level:

Depth 0 (root):        idx = "1"
Depth 1 (child):       idx = "1_2"        ← root row 1, child element 2
Depth 2 (grandchild):  idx = "1_2_3"      ← root row 1, child 2, grandchild 3
Depth 3 (great-grand): idx = "1_2_3_1"    ← root row 1, child 2, grandchild 3, great-grandchild 1

Each segment represents the array position at that nesting level. This means:

  • Every child row carries its full ancestry in idx.
  • To find a child's parent, strip the last segment.
  • To join parent ↔ child, match on the parent's depth offset.

Joining Parent to Child Tables

The rule: for each segment in the parent's idx, add one equality condition comparing that segment position in both parent and child. A parent at depth N has N segments — you expand N index conditions.

BigQuery

-- Depth 0 → 1: root (idx="1") → experiments (idx="1_2")
-- Parent has 1 segment → 1 index condition
SELECT
    r.*,
    e.experiment_name,
    e.experiment_status
FROM `project.dataset.frg__root` r
JOIN `project.dataset.frg__root__expe1` e
    ON  r.ingestion_hash = e.ingestion_hash
    AND SPLIT(r.idx, '_')[OFFSET(0)] = SPLIT(e.idx, '_')[OFFSET(0)]

-- Depth 1 → 2: experiments (idx="1_2") → team (idx="1_2_3")
-- Parent has 2 segments → 2 index conditions
SELECT
    e.*,
    t.team_name,
    t.team_role
FROM `project.dataset.frg__root__expe1` e
JOIN `project.dataset.frg__root__expe1__team1` t
    ON  e.ingestion_hash = t.ingestion_hash
    AND SPLIT(e.idx, '_')[OFFSET(0)] = SPLIT(t.idx, '_')[OFFSET(0)]
    AND SPLIT(e.idx, '_')[OFFSET(1)] = SPLIT(t.idx, '_')[OFFSET(1)]

-- Depth 2 → 3: team (idx="1_2_3") → lab_results (idx="1_2_3_1")
-- Parent has 3 segments → 3 index conditions
SELECT
    t.*,
    l.lab_name,
    l.result_value
FROM `project.dataset.frg__root__expe1__team1` t
JOIN `project.dataset.frg__root__expe1__team1__lab_1` l
    ON  t.ingestion_hash = l.ingestion_hash
    AND SPLIT(t.idx, '_')[OFFSET(0)] = SPLIT(l.idx, '_')[OFFSET(0)]
    AND SPLIT(t.idx, '_')[OFFSET(1)] = SPLIT(l.idx, '_')[OFFSET(1)]
    AND SPLIT(t.idx, '_')[OFFSET(2)] = SPLIT(l.idx, '_')[OFFSET(2)]

-- Three-level join: root → experiments → team
SELECT
    r.patient_id,
    e.experiment_name,
    t.team_name
FROM `project.dataset.frg__root` r
JOIN `project.dataset.frg__root__expe1` e
    ON  r.ingestion_hash = e.ingestion_hash
    AND SPLIT(r.idx, '_')[OFFSET(0)] = SPLIT(e.idx, '_')[OFFSET(0)]
JOIN `project.dataset.frg__root__expe1__team1` t
    ON  e.ingestion_hash = t.ingestion_hash
    AND SPLIT(e.idx, '_')[OFFSET(0)] = SPLIT(t.idx, '_')[OFFSET(0)]
    AND SPLIT(e.idx, '_')[OFFSET(1)] = SPLIT(t.idx, '_')[OFFSET(1)]

Snowflake

-- Depth 0 → 1: root → experiments (1 condition)
SELECT r.*, e."experiment_name"
FROM "DATASET"."FRG__ROOT" r
JOIN "DATASET"."FRG__ROOT__EXPE1" e
    ON  r."ingestion_hash" = e."ingestion_hash"
    AND SPLIT_PART(r."idx", '_', 1) = SPLIT_PART(e."idx", '_', 1)

-- Depth 1 → 2: experiments → team (2 conditions)
SELECT e.*, t."team_name"
FROM "DATASET"."FRG__ROOT__EXPE1" e
JOIN "DATASET"."FRG__ROOT__EXPE1__TEAM1" t
    ON  e."ingestion_hash" = t."ingestion_hash"
    AND SPLIT_PART(e."idx", '_', 1) = SPLIT_PART(t."idx", '_', 1)
    AND SPLIT_PART(e."idx", '_', 2) = SPLIT_PART(t."idx", '_', 2)

General Join Formula

For a parent at depth N joining to a child at depth N+1, expand N index conditions — one per segment of the parent's idx:

parent.ingestion_hash = child.ingestion_hash
AND SPLIT(parent.idx, '_')[OFFSET(0)] = SPLIT(child.idx, '_')[OFFSET(0)]
AND SPLIT(parent.idx, '_')[OFFSET(1)] = SPLIT(child.idx, '_')[OFFSET(1)]
  ...
AND SPLIT(parent.idx, '_')[OFFSET(N-1)] = SPLIT(child.idx, '_')[OFFSET(N-1)]

The child always has one more segment than the parent — that final segment is the child's own position within the parent array.

Table Naming Convention

Table names encode the nesting path with truncated field names:

frg__root                          ← root extraction
frg__root__expe1                   ← root.experiments (truncated to 4 chars + counter)
frg__root__expe1__team1            ← root.experiments[].team
frg__root__expe1__team1__lab_1     ← root.experiments[].team[].lab_results
frg__root__hosp1__staf1__nurs1     ← root.hospital[].staff[].nurses

The Rollup View

The frg__rollup view automatically reassembles all normalized tables back into nested STRUCT/ARRAY form — reconstructing the original JSON shape as queryable warehouse-native types. Use it when you want the full document without manual joins.

License

Apache 2.0