惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

D
Docker
爱范儿
爱范儿
T
The Exploit Database - CXSecurity.com
量子位
T
Tailwind CSS Blog
T
Threatpost
The GitHub Blog
The GitHub Blog
AWS News Blog
AWS News Blog
云风的 BLOG
云风的 BLOG
K
Kaspersky official blog
P
Proofpoint News Feed
博客园 - 司徒正美
L
LangChain Blog
T
Threat Research - Cisco Blogs
C
CERT Recently Published Vulnerability Notes
罗磊的独立博客
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 叶小钗
S
Secure Thoughts
The Last Watchdog
The Last Watchdog
Spread Privacy
Spread Privacy
H
Hacker News: Front Page
T
Troy Hunt's Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Google DeepMind News
Google DeepMind News
W
WeLiveSecurity
A
Arctic Wolf
Apple Machine Learning Research
Apple Machine Learning Research
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
T
Tor Project blog
T
The Blog of Author Tim Ferriss
I
Intezer
P
Privacy & Cybersecurity Law Blog
美团技术团队
N
Netflix TechBlog - Medium
博客园_首页
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Vulnerabilities – Threatpost
Application and Cybersecurity Blog
Application and Cybersecurity Blog
G
Google Developers Blog
Attack and Defense Labs
Attack and Defense Labs
T
Tenable Blog
月光博客
月光博客
Stack Overflow Blog
Stack Overflow Blog
J
Java Code Geeks
腾讯CDC
Microsoft Security Blog
Microsoft Security Blog
A
About on SuperTechFans
Last Week in AI
Last Week in AI

HN's home page

Rainbow Query Language | Hacker News Exec into Node via Kubectl An AI native hedge fund The Seven-Action Documentation Model | Hacker News Package Manager for Kubectl Plugins Tongan Castaways | Hacker News Tech overlords plan for conscious AI to conquer the cosmos. What could go wrong? Data Breach Disclosure Lag Is Getting Worse How LLMs Work | Hacker News I Dropped PRDs for Shape Up Go Experiments Explained | Hacker News FCA's Palantir deal could expose UK financial data to Trump's US, critics fear WebXR BCI for Neural-Adaptive Avatar Control in Mixed Reality The first murder conviction via DNA analysis Tom Interviews Theo de Raadt of the OpenBSD Project (2019) [video] Show HN: Replace shell commands with bun shell typescript scripts Quay.io Is Down | Hacker News AI driven analysis of brokerage account fees in the UK Bill Gates Spent Years Crafting His Image. Now It's Cracking Using LLMs to secure source code Wi-Fi 8 in the Lab [video] The household battery revolution that could change energy bills and the world Is Python Becoming Pinyin? | Hacker News Livia – Executive Assistant | Hacker News FindMyPipe – Query Apple Find My from Linux for AI Agents Show HN: Agent skill for creating product launch videos with Remotion RecruitMyself – AI job search copilot for resumes and applications AI coding agents and the erosion of system understanding The 'Resting' Generation and South Korea's Youth Recession AMD Computex 2026: 10 Years of AM4, AM5 Support Through 2029 Docker Networking Explained | Hacker News Textbooks in Tokenland | Hacker News Key Chemistry Question Answered, No Quantum Computer Required Gifts For Retrocomputing Fans – remix yesterday's tech with a modern spin Miscellany № 49: introducing the quasiquote – Shady Characters Amazon Thinks the Future of Data Centers Is a Technical Problem It Just Solved A brief history of the UUID (2017) Flying High Unpressurized (2016) | Hacker News Five Years of Trying to Add Recursion to Lychee How British comfort food won over the French Blorp Language | Hacker News Decache – you might have the internet's lost media in your PC's cache folders Criminal Activities and Migration | Hacker News A free, open-source library of DESIGN.md files for AI-generated UIs MiniMax M3 | Hacker News People are apparently farming citations on ResearchGate – Chuniversiteit Hacker News Basketeer – a typed TS SDK for your Tesco account, with nutrition data 'Penguin' decays from CERN's Large Hadron Collider experiment hint new physics Emergence World: A Laboratory for Evaluating Long-Horizon Agent Autonomy Homebrew lead Mike McQuaid: Sandboxes and Worktrees - My Secure Agentic AI Setup Lean, Not Backpressure | Hacker News AI Dangers Eclipse Nuclear Weapons at Singapore Defense Forum Open source analytics that answers backbase How turkey hacked the hair-transplant industry How GPT Image 2 Is Transforming Marketing Workflows in 2026 Improve Git monorepo performance with a file system monitor Strava for Claude Code MiniMax M3 on Qubrid AI There's Something Else We Should Be Worrying About Celebrity Profile of an A.I. Actress What Is Windows K2? | Hacker News AI is devoid of meaning and humanity. Its vapid voice suits the political moment Show HN: Interpreto – Live Translation for Travel Taxicab Geometry Sealed classes and interfaces in Java (2025) Show HNs | Hacker News My AI Skill Edited This Video That Explains My AI Skill – Arcturus Labs Amazon Pinpoint End of Support The Mystery of the Backward Index MP/M's Process Dispatcher SlimTide Reviews: A Modern Solution for Metabolism and Energy Learning Lustre: Type-safe front end development with gleam Thomas Mann: Goethe Heartened by Panama (As Suez for English, or Danube-Rhine) How to make Message Log of the Unreal Engine 100 times faster Sum-product, unit distances, and number fields Can Meta Buy Belief? | Hacker News Twenty Years of Bigtable | Hacker News Show HN: Combine WigglyPaint GIFs into Video Show HN: AgentThreatBench – Benchmark for AI Agent Memory Security Genius Spotted in the Wild Napkins: Where Ethernet, Compaq and Facebook’s cool data center got their starts (2011) Moderate caffein use alters sleep-related EEG Nvidia Announces RTX Spark | Hacker News Show HN: Ministry of Everything – CLI agent harness for a single operator CEOs blame AI for layoffs, MIT prof says it fits a pattern to find cover story Bugs I didn't expect while building a zsh cleanup script for macOS dev machines Nvidia jumps into PCs with new chip debuting in laptops from Microsoft, Dell, HP Nvidia unveils PC 'superchip' in challenge to Apple and Intel Show HN: Having fun making mini static site apps Synthea API: Create Synthetic Medical Records as a Service Berkshire Hathaway to buy Taylor Morrison for $6.8B in cash The most complex model we understand [video] SanDisk stock is +4,440.53% in the past year Driftwm: What if your window manager worked like a whiteboard? US Immigration enforcement looks into buying ad data AI Is Creating More Work for Australia's Workplace Tribunal Finding New Biblical Cross-References with Codex Glide: A tiling window manager for macOS Ultra-highly efficient enrichment of uranium from seawater via studtite nanodots (2024)
Show HN: fenic – LLMs as dataframe operators, query meaning and structure
cpard · 2026-07-01 · via HN's home page

Hey friends. I'd like to share a project that's dear to me. fenic is a dataframe API with LLMs added as first-class citizens, a classic lazy dataframe API extended with new operators that are backed by LLMs.

What this gets you is the ability to work with structured and unstructured data in the same context. Most importantly, the LLMs aren't integrates as opaque UDF black boxes. They're exposed as "semantic" operators that the planner can reason about alongside the classic ones.

(There are examples and code snippets on the repo to see how everything works together)

Why build this? I'm a data infra / systems person. When LLMs showed up, what I saw was a new type of compute that changes the characteristics of the workloads we deal with. I wanted to experiment with how our current systems can absorb these new workloads and compute types, and what it would take to make the DX as seamless as possible, that's where the UDF + arbitrary prompt was feeling too problematic.

To support this properly, we had to introduce a few really cool things:

New plan operators. You don't just send prompts at an LLM. You use operators like semantic join, semantic map and reduce, and semantic filter, among others. They mix with the classic operators, and because the planner sees them as real operators rather than black boxes, it can reorder work around them.

Typed outputs. There's ergonomics to turn the output of a semantic operator straight into a typed dataframe column. A Pydantic schema for the LLM output becomes a typed struct column you can unnest, explode, and so on.

New data types like a markdown data type. Markdown became an important way to share information with LLMs, even though it started life as a way to format text for presentation. It carries structure, and being able to access that structure the way you would a struct or JSON type adds to the developer experience I mentioned.

Async UDFs. One of the more interesting shifts in workloads from the LLM explosion is the need to put heavily I/O-bound steps in your pipeline: fetching a response from an API, crawling a website, and so on. Async UDFs fill that gap, and the implementation handles the nuances for you: concurrency, retries, and the rest.

An LLM-inference-aware planner and runtime. This is one of the parts I'm most excited about, and there's a lot still to do. Today: identical prompts within a batch collapse to a single model call, so duplicates cost zero tokens; requests are dispatched concurrently under per-provider rpm/tpm limits with retries and backoff; null and empty cells skip the model entirely; and you get token and cost metrics per operator. There's also an optional persistent response cache so re-runs skip the model.

MCP as a new catalog primitive. Much like a registered view, you can register a dataframe pipeline as an MCP tool in the catalog. fenic then serves an MCP server with that pipeline as the tool's logic, executed over your data.

These are just some of what's gone into fenic while experimenting with how LLMs can become part of our compute infrastructure. There's more, and plenty more to polish on what's already there.

I've been using fenic for all sorts of things. On the small/personal end, I use it to take my podcast audio recordings and turn them into nicely structured tables of metadata I can research. On the heavier end, I use it as tooling for agents to analyze agent traces exported from Pydantic Logfire, to discover evals and turn them into reproducible artifacts in the form of dataframe pipelines.

  pip install fenic
  Repo: https://github.com/typedef-ai/fenic
  Docs: https://docs.fenic.ai

There's also a skill you can use with claude code, codex etc. to quickly get started with fenic in your favourite agentic coding environment.

I'd love to hear your thoughts, criticism, and anything else that comes to mind.

I'm here to answer questions.