惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

Hacker News: Show HN

Show HN: Write your BPF programs in Go, not C GitHub - Userfrom1995/benchd: BenchD is a browser-based CPU benchmark that runs fully on the client. GitHub - LeoStehlik/proof-loop: Repo-local verification protocol for AI coding agents: acceptance criteria, separate verifier roles, proof artifacts, and evidence-backed done claims. Show HN: Free One-shot cloud agents with OpenCode and Daytona and Cloudflare Parseflow Segment Tree — Algorhythm Show HN: GitVitae – Free hosted portfolio and resume for anyone GitHub - wavever/buildby: Detect whether desktop apps are built with Electron, Flutter, Tauri, Qt, .NET, JVM, CEF, or native code. boku — YAML task runner Show HN: Darc – grep-like memory search tool for coding agents Mixpanel Headless - Mixpanel Docs Show HN: A demo video of Effected Keyboard 2 Introducing Open Public awesome-skills/gtm-mavericks at main · conductor-oss/awesome-skills Show HN: ATM, a tiny terminal task manager for local coding agents Freenet Workspace Show HN: AI Manager Show HN: SubTrack – Find forgotten subscriptions via bank transaction scanning Show HN: We dropped Go for Rust in our real-time telephony AI media plane Show HN: I Dedicated 4 Years to Mastering Offline Password Cracking Home — Noada Show HN: I Made a Claude Skill for SDD Show HN: Twixt – transform one word into another in four moves Show HN: Daily word puzzle game based on polysemy GoKubeDownscaler: Reduce Kubernetes Costs Off-Hours GitHub - openclaw-easy/ViralMint: Open-source viral content pipeline — scout trends, analyze competitors, generate AI videos, auto-publish. AGPL-3.0. GitHub - baidu-baige/LoongForge: A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models. Show HN: (Better) Chrome Tab Manager Show HN: SoMatic – Vision-based OS automation framework for AI agents Physics AI – Free Physics Solver Online (Step-by-Step) SUPPLYCHAIN.FAIL — Open Source Vulnerability Timeline PocketWebTools GitHub - mirshko/boatswain: A macOS menu bar app for Fathom Analytics. Keep an eye on your site stats without ever leaving your keyboard. What does your investment actually buy? — Post-Money SAFE Calculator GitHub - vipulawl/claude-tips: Customize Claude Code spinner tips with live jokes, quotes, facts, or your own content GitHub - changespec/spec: ChangeSpec: open specification for software change communication Show HN: I built a private, manual 0% balance transfer tracker 3.125-Bit LLM quantization bypassing tensor cores Medical curiosities | Thomas Morris FlutterTime ~ Timezone Planner Steam 上的 Junebug GitHub - Helvesec/rmux: Universal Rust multiplexer with a typed SDK — drive any CLI or TUI app from code. Native on Linux, macOS, and Windows. GitHub - manas15/try-on: LiveLook — Real-time virtual try-on with gesture control, powered by Decart's Lucy VTON model GitHub - vitalysim/the-knowledge-guy: Turn any PDF or EPUB into a structured Claude Code skill - then ask your whole bookshelf a single question. Gemini Omni Flash AI Video Generator | Free Online GitHub - elliotgao2/handsets: A high-performance Android control CLI, built for agents and humans GitHub - enzoferraripapa-arch/ai-vprocess-ops: Engineering memory for AI coding agents: requirements, decisions, evidence, traceability, and V-process/ALM handoff Show HN: Dokkaebi – Run your WASM backend directly on the client side Send messages beyond your lifetime SkinMax App | Your Personal Skin Care Coach GitHub - kmdupr33/fks2g: A CLI for generating LLM-backed metrics for deciding how closely to review code ISS QuietGPT - Make ChatGPT Reply Smaller GitHub - Quintisimo/macfigure: Mac configuration in pkl. Simple alternative to nix-darwin Show HN: SafeRun – Replay debugging and inline prevention for AI agents 3 GitHub - sathvikc/agent-chat-bridge: Turn any AI agent chat session into an async agent. Register a timer, shell command, or webhook — the bridge automatically resumes the session with your prompt when the trigger fires. SnapAPI - Website Screenshot & Data Extraction API Introducing @cipherstash/stack Show HN: E2E Encrypted Terminal Screen Share Windows 98½ Show HN: SafeRun – Replay debugging and inline prevention for AI agents 2 Show HN: My custom Statusline for Claude Code (Python wrapper around claudeline) GitHub - kageroumado/phosphene: A video wallpaper engine for macOS Tahoe Best Remote Jobs — Work From Home | RemoteJobs.place udoc Free AI Rewriter - Revise GitHub - arashThr/hugo-flow: Simple rich-text CMS for Hugo weblogs. Try at https://hugo.arashtaher.com GitHub - light-cloud-com/ice: Free, open-source, visual studio for cloud infrastructure for macOS, Windows & Linux. GitHub - kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. What if we made SIMA2 from Temu iPhone 版“Today” - App Store Runo - Web Scraping API | Any URL to Typed JSON Show HN: AI Editor for Websites GitHub - AdamGonda/ward: Run [ npm i ] safely, audit installs inside a docker container. The Crucible — 8 voices, one verdict Screenshot 2026 05 20 at 4 03 10 PM — Postimages Show HN: Chess Puzzles, but for Developers Show HN: I built Istanbul live transit map Show HN: Agent.email – sign up via curl, claim with a human OTP GitHub - mfairley/expo-callkit-telecom: 📞 CallKit + Core-Telecom for React Native + Expo. A modern react-native-callkeep alternative. I tried 4 LLM speedup techniques on CPU. Three made it slower. Show HN: I made a tool for learning scales, chords, and how to combine them Learn how to build AI products through practice 1 BTC = 17.17 troy oz of gold · Bitcoin Weigh-In p-Hacker — top trending Client Challenge hty GitHub - Artain-AI/ignite-ms: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control. GitHub - mupt-ai/dari-docs: optimize your documentation through fleets of agents GitHub - dcostenco/prism-coder: The Mind Palace for AI Agents - HIPAA-hardened Cognitive Architecture with on-device LLM (prism-coder:7b), Hebbian learning, ACT-R spreading activation, adversarial evaluation, persistent memory, multi-agent Hivemind and visual dashboard. Zero API keys required. Catio | The Architecture IDE for Modern Software Systems SysWP Radar — Veja TUDO que toca seu site homecrew — package manager for agent skills GitHub - platform-engineering-labs/formae: Infrastructure-as-Code Platform Built for the Future VibeKeys Max - Ready to Ship Show HN: We wrote forensic intelligence reports on 20 open-source codebases GitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing. GitHub - zero-intelligence/zero-protocol: ZERO.md — A universal protocol for personal AI context. Your AI knows your project. It doesn't know you. This file fixes that. Show HN: Chatroom with curl command (requires IPv6)
GitHub - verdverm/pge-jax: Jax implementation of the PGE algorithm (Prioritized Grammar Enumeration)
verdverm · 2026-05-21 · via Hacker News: Show HN

JAX implementation of the Prioritized Grammar Enumeration (PGE) algorithm for symbolic regression.

Overview

pge-jax is a complete symbolic regression system that automatically discovers mathematical formulas from data. It enumerates candidate expressions from a grammar, fits their coefficients using JAX-native Levenberg-Marquardt optimization, and selects the best models using a multi-objective pareto front (NSGA-II).

The key advantage over prior implementations (pypge, go-pge) is a fully JAX-native evaluation pipeline, enabling:

  • GPU/TPU acceleration of model evaluation and Jacobian computation
  • JIT compilation via jax.jit and jax.vmap
  • Automatic differentiation for efficient gradient-based optimization
  • No external ML dependencies — no scikit-learn, lmfit, or DEAP required

Quick Start

import jax
jax.config.update("jax_enable_x64", True)

import numpy as np
from pge_jax import PGE

# Generate synthetic data
np.random.seed(42)
X = np.random.randn(100, 2)
Y = 3.0 * X[:, 0] + 1.5 * X[:, 1]**2 - 0.5 * np.sin(X[:, 0])

# Run PGE search
pge = PGE(
    usable_vars=["x0", "x1"],
    usable_funcs=["sin", "cos", "exp", "log"],
    max_iter=10,
    pop_count=3,
    peek_npts=16,       # subset size for fast partial evaluation
)
pge.fit(X, Y)

# Get results
best = pge.get_best_model()
print(best.pretty_expr())  # e.g. "3.0*x0 + 1.5*x1**2 - 0.5*sin(x0)"

paretos = pge.get_final_paretos()  # list of Pareto fronts

Algorithm

Symbolic Regression

Given data points $(x_1, y_1), \dots, (x_N, y_N)$, symbolic regression searches the space of all valid mathematical expressions for one that fits the data well — but unlike standard regression, it returns an interpretable formula rather than opaque coefficients.

The search space is combinatorial: with $V$ variables, $F$ functions, and a max tree depth $D$, the number of candidate expressions grows exponentially. Exhaustive search is infeasible. Instead, symbolic regression uses an iterative loop that cycles through three phases:

  data       ┌──────────┐    ┌──────────┐    ┌──────────┐
 ──────►     │ Explore  │───►│ Evaluate │───►│  Select  │
             └──────────┘    └──────────┘    └──────────┘
                   ▲                               │
                   │       ┌───────────────┐       │
                   └───────│    "best"     │◄──────┘
                           │ C_0*x0+C_1    │        
                           │ C_0*x0+C_1*x1 │      
                           │ C_0*sin(x0)   │        
                           └───────────────┘          

Explore generates candidate expressions from a grammar of valid forms. Evaluate fits coefficients and computes accuracy. Select keeps the best candidates and feeds them back into exploration, gradually improving the set of discovered formulas.

PGE Algorithm

Prioritized Grammar Enumeration (PGE) is a deterministic symbolic regression algorithm that replaces genetic operators and randomness with grammar production rules and systematic enumeration. PGE enumerates expressions in order of increasing complexity, pruning through multi-objective selection. Algebraic canonicalization normalizes operands using associativity and commutativity rules before memoization so equivalent expressions are detected as duplicates, and additional optimizations include early termination on diverging fits and intermediate value bounds checking.

PGE combines grammar-based expression generation with evolutionary multi-objective optimization:

  1. Grammar-based generation — A context-free grammar defines valid expressions from variables, functions, constants, and arithmetic operators. The Grower class enumerates candidates using five expansion operators: variable substitution, addition extension, multiplication extension, coefficient-scaled functions, and shrink.

  2. Filtering — Expressions are rejected for violating size limits, having integer coefficients, or exceeding power bounds.

  3. Memoization — Structural deduplication skips expressions already explored, avoiding redundant work.

  4. Algebraic manipulation — Valid expressions are expanded, factored, and simplified to discover equivalent forms that may have better coefficient fits.

  5. Progressive evaluation — Candidates are first evaluated on a small data subset (peek_npts). Only the most promising survive via NSGA-II selection to full evaluation on all training data.

  6. Multi-objective optimization — Each candidate is scored on multiple objectives (RMSE, complexity, AIC/BIC). NSGA-II maintains a Pareto front of non-dominated solutions across iterations.

  7. Coefficient fitting — Levenberg-Marquardt optimization tunes free coefficients for each expression, enabling fair comparison between structurally different candidates.

The result is a Pareto front of trade-off solutions — from simple approximate formulas to complex high-accuracy ones — from which the user can select the best interpretation for their problem.

How It Works

Two-Layer Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        PGE Search Loop                          │
│  (search.py — orchestration, selection, expansion)              │
│                                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐   │
│  │ Generate │───▶│  Filter  │───▶│ Memoize  │───▶│  Algebra │   │
│  │ (Grower) │    │          │    │          │    │          │   │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘   │
│                                                        │        │
│                                                        ▼        │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐   │
│  │  Final   │◀───│  Full    │◀───│  Peek    │◀───│ Select   │   │
│  │  Push    │    │ Evaluate │    │ Evaluate │    │ (NSGA-II)│   │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘   │
└─────────────────────────────────────────────────────────────────┘
         │
         │ SearchModel (expression + state + fitness)
         │
┌────────┴────────────────────────────────────────────────────────┐
│                    Evaluation Layer                             │
│  (model.py + optimize.py + metrics.py)                          │
│                                                                 │
│  SearchModel ──▶ JAXModel ──▶ fit_model() ──▶ evaluate()        │
│                                    │              │             │
│                                    ▼              ▼             │
│                               FitResult      EvalResult         │
│                               (coeffs,       (RMSE, R²,         │
│                                cost)         AIC, BIC, ...)     │
└─────────────────────────────────────────────────────────────────┘

Progressive Evaluation

All candidate expressions from Grower
            │
            ▼
    ┌───────────────┐
    │  Peek Eval    │  ← Uses peek_npts (default 16) data points
    │  (fast subset)│
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │ NSGA-II       │  ← Multi-objective selection keeps
    │ _peek_pop()   │     only the most promising candidates
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │  Full Eval    │  ← Uses ALL training data
    │  (expensive)  │     only on selected candidates
    └───────┬───────┘
            │
            ▼
    ┌───────────────┐
    │  _final_push()│  ← Accumulate into Pareto front
    └───────────────┘

Evaluation Pipeline

sympy.Expr
    │
    ▼
JAXModel.__init__()
    │  - wraps sympy expression
    │  - compiles jax_fun + jac_fun via sympy.lambdify
    ▼
fit_levenberg_marquardt()
    │  - optimizes coefficients (C_0, C_1, ...)
    │  - uses JAX-computed Jacobian
    ▼
FitResult
    │  - coefficients: [2.0, 1.0]
    │  - predictions: model output at optimized coeffs
    │  - cost: sum of squared residuals
    ▼
evaluate()
    │  - computes regression metrics
    ▼
EvalResult
    │  - score (RMSE), r2, aic, bic, chisqr, ...

Public API

PGE — End-to-End Search

The primary entry point. Wraps the full search pipeline:

from pge_jax import PGE

pge = PGE(
    usable_vars=["x0", "x1"],   # or "x0 x1" or list of sympy.Symbol
    usable_funcs=["sin", "cos", "exp", "log", "tan", "sqrt"],
    max_iter=100,               # search iterations
    pop_count=3,                # models expanded per expander per iteration
    peek_count=6,               # models selected from peek heap for full eval
    peek_npts=16,               # data points for fast partial (peek) evaluation
    max_size=64,                # max expression tree size
    max_power=5,                # max power exponent
    algebra_methods=["expand", "factor"],
    err_method="mse",           # error metric for score
    random_seed=23,
    expanders=[                 # optional: multiple expanders with different configs
        ExpanderConfig(pop_count=3, max_size=32),
        ExpanderConfig(pop_count=2, max_size=64),
    ],
)

pge.fit(X_train, Y_train)          # sklearn-style: returns self
best = pge.get_best_model()        # SearchModel with lowest score
paretos = pge.get_final_paretos()  # list[list[SearchModel]]

SearchModel — Expression State

Wraps a sympy expression with lifecycle state, size metrics, and fitness:

from pge_jax import SearchModel
import sympy

x = sympy.Symbol("x")
c0 = sympy.Symbol("C_0")
expr = c0 * x + 1

model = SearchModel(expr, xs=[x], cs=[c0])
model.rewrite_coeff()   # builds JAXModel wrapper, converts bare C → C_0, C_1, ...

model.size()            # tree size
model.psz               # penalised size (+2 per function node)
model.jpsz              # penalised Jacobian size
model.score             # RMSE after fitting
model.r2                # R-squared
model.pretty_expr()     # expression with fitted coefficients substituted

JAXModel — Pure JAX Evaluation

For users who want to evaluate a specific expression without the search loop:

from pge_jax import JAXModel, fit_model, evaluate
import jax.numpy as jnp
import sympy

x = sympy.Symbol("x")
c0, c1 = sympy.symbols("C_0 C_1")
expr = c0 * x + c1

model = JAXModel(expr)
result = fit_model(model, jnp.array([3.0, 5.0, 7.0]), jnp.array(x_data))
print(result.coefficients)  # [2.0, 1.0]

eval_result = evaluate(model, jnp.array(y_true), result.predictions)
print(f"R²: {eval_result.r2:.4f}")

Individual Components

All modules are importable independently:

from pge_jax import (
    # Filters
    filter_models, default_filters,

    # Algebra
    manip_model, do_simp,

    # Memoization
    Memoizer,

    # Selection
    selNSGA2, selSPEA2, sortLogNondominated, isDominated, assignCrowdingDist,

    # Fitness
    build_fitness_calc, build_fitness_weights, build_value_extractor,

    # Expansion
    Grower, map_names_to_funcs,

    # Metrics
    rmse, mae, mse, r2, explained_variance, aic, bic, chisqr, redchi, rmae,

    # Optimizers
    fit_levenberg_marquardt, fit_least_squares,
)

Design Decisions

Why sympy + JAX?

Sympy handles expression representation, tree manipulation, expansion, and simplification. JAX replaces the evaluation and optimization pipeline, providing GPU acceleration and automatic differentiation.

No DEAP Dependency

Fitness values are stored directly as tuples on SearchModel objects (fitness_values, wvalues, crowding_dist). Selection functions expect these attributes instead of DEAP's Fitness wrapper.

Two Model Classes

  • JAXModel — Pure JAX evaluation wrapper (sympy → JAX, predict, jacobian)
  • SearchModel — Search-loop state machine (lifecycle flags, size metrics, fitness, selection compatibility)

Levenberg-Marquardt

The default optimizer because it's well-suited for least-squares problems, uses the JAX-computed Jacobian efficiently, and matches the original pypge approach.

Progressive Evaluation

The search uses peek_npts (default 16) data points for fast partial evaluation of candidate expressions, then fully evaluates only the most promising ones on all training data. This dramatically reduces the number of expensive full evaluations.

Installation

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install from source
pip install -e ".[dev]"

# Or with benchmark dependencies (for pypge compatibility)
pip install -e ".[dev,benchmark]"

Optional Dependencies

Group Packages Purpose
dev pytest, ruff, mypy, pre-commit Development tooling
benchmark scikit-learn, lmfit, pandas Compatibility with pypge benchmarks
notebook jupyter, matplotlib Interactive exploration

Development

# Run all tests
python -m pytest tests/ -v

# Lint
ruff check pge_jax/ tests/

# Format
ruff format pge_jax/ tests/

# Type check
mypy pge_jax/

Note from "Author"

Note

This project is an experiment in using agents to modernize my PhD work. The stack is OpenCode + Qwen-3.6 35B A3B MoE (unsloth@UD-Q8_K_XL) + DGX Spark. Things seem to be working as before so far, but it is largely a blackbox reimplementation. You can see history in the .sessions/ directory, but OpenCode doesn't include user message in /export... ¯\_(ツ)_/¯ The intention is to continue this experiment into new research with open weight qwen36moe (unless we get a simlar size 3.7 soon)

Tip

Little qwen36moe is an awesome model and my daily driver

Prior Art

Citation

"Prioritized Grammar Enumeration" — Best Paper, GECCO 2013 http://dl.acm.org/citation.cfm?id=2463486