惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LINUX DO - 热门话题
T
The Blog of Author Tim Ferriss
WordPress大学
WordPress大学
酷 壳 – CoolShell
酷 壳 – CoolShell
美团技术团队
博客园 - 叶小钗
李成银的技术随笔
V
Visual Studio Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
V
V2EX
博客园 - 司徒正美
Blog — PlanetScale
Blog — PlanetScale
大猫的无限游戏
大猫的无限游戏
T
Tailwind CSS Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
aimingoo的专栏
aimingoo的专栏
人人都是产品经理
人人都是产品经理
GbyAI
GbyAI
A
About on SuperTechFans
罗磊的独立博客
W
WeLiveSecurity
L
LINUX DO - 最新话题
M
MIT News - Artificial intelligence
Hacker News: Ask HN
Hacker News: Ask HN
Application and Cybersecurity Blog
Application and Cybersecurity Blog
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
P
Proofpoint News Feed
Microsoft Security Blog
Microsoft Security Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
H
Help Net Security
Martin Fowler
Martin Fowler
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Register - Security
The Register - Security
M
Microsoft Research Blog - Microsoft Research
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - Franky
The Cloudflare Blog
C
Cisco Blogs
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Google Online Security Blog
Google Online Security Blog
有赞技术团队
有赞技术团队
AWS News Blog
AWS News Blog
C
Cybersecurity and Infrastructure Security Agency CISA
小众软件
小众软件
I
Intezer
N
Netflix TechBlog - Medium
N
News and Events Feed by Topic

DEV Community

Running Flux Schnell (12B) + LLMs on a Legacy AMD RX 580 (8GB) via Native Vulkan — Full Architecture Guide [2026] Investigation Reports: When Monitors Get Smarter Semantic Layer Best Practices: 7 Mistakes to Avoid I Run MCP Servers. Here's What the Recent Vulnerabilities Actually Mean for Me Phive v1.1.1 — automatic port conflict handling for local VS Code environments Building a SQL-like Relational Database Engine in C++ From Scratch How a Self-Documenting Semantic Layer Reduces Data Team Toil The Adopter: Advocating for OSS You Use (But Don't Own) Optimizing Vite Build Output: A Practical Guide to Tree-Shaking I built a free audit tool that runs 12 checks in parallel against any domain. Here is the architecture. I made a free 7-video series to prep for the new GH-600 (GitHub Agentic AI Developer) cert Why One Model Is Never Enough: Routing Incident Analysis With cascadeflow Forecast Cone: A Grand Theorem for Computable Software Evolution Choosing the Right Treasure Map to Avoid Data Decay in Veltrix Migrating to Apache Iceberg: Strategies for Every Source System Stop Reviewing Every Line of AI Code - Build the Trust Stack Instead Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter Should you use Gemma 4 for your Development? A Multiversal Analysis to Determine if Gemma 4 is Right for You! The Rising Trend of Creative Interview Questions in Tech I Spent Hours Fighting a Silent Subnet Conflict to Build an Isolated ICS Security Lab (And What It Taught Me About the Linux Kernel) It Worked When I Closed the Laptop. I Swear. We Built an Agent That Flags Fake Internships #kryx Your Personal AI Stack Is the New Dotfiles Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the Fix How We Prevent Attendance Fraud Using GPS Verification AI Code Review in 2026: How the Tools Actually Differ (A Builder's Field Guide) From Problems to Patterns: Generative AI in .Net (C#) GemmaOps Edge: From 373 Alarms to 1 Root Cause Using Local AI (Gemma 4) Building an Amazon EKS Security Baseline Hands-On with Apache Iceberg Using Dremio Cloud 🤫 Firebase Is Quietly Preparing for an Offline-First AI Future Should Angular Apps Still Rely on RxJS in 2025? Gaslighting Gemma 4: Can Open-Weight Reasoning Models Withstand a Confident Liar? AI Workflow Automation Needs More Than Another Script Reviving Cineverse: From Local Storage to Firebase 🚀 Approaches to Streaming Data into Apache Iceberg Tables How to Add Rounded Corners to an Image Online The subtle impact of AI (&amp; IT) on jobs Made a Rust based AI agent Your AI is not bad, your instructions are What Clicked for Me After Building on Solana for a Few Days WhatsApp's Encryption Stack: What It Covers, What It Doesn't, and What a Federal Agent Spent 10 Months Investigating Building CogniPlan: A Local-First Task Planning System Using Apache Iceberg with Python and MPP Query Engines How I Built AegisDesk: A Zero-Token Semantic IT Agent with <5ms Latency I built CodeArchy: an open-source that turns any codebase into a visual, explainable architectural experience, powered by Gemma 4. The Day Our Bot Ran Out of Money How we're using Gemini Embeddings to build a smarter, community-driven feed on DEV The Speculative Decoding Pattern The PKCE "Gotcha" in Expo’s exchangeCodeAsync TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4) n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) Condition-Based vs Time-Based Maintenance: Making the Switch I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance
Track YC Demo Day Companies in Real Time (with code)
NexGenData · 2026-05-23 · via DEV Community

Track YC Demo Day Companies in Real Time (with code)

Y Combinator Demo Day is the single most concentrated VC sourcing event of the year. Twice annually, ~250 companies present back-to-back over 1-2 days. Within 48 hours, the top 50 have term sheets. Within 7 days, the next 50 have term sheets. By day 14, the remaining 150 are either oversubscribed or starting to struggle.

The associate's job at a multi-stage fund during Demo Day is roughly:

  1. Within 6 hours: scrape every company's details from the YC site
  2. Within 12 hours: triage to the 30-50 worth investigating
  3. Within 24 hours: book first calls with the top 15
  4. Within 48 hours: close on the top 5

The bottleneck is step 1 — and it's an entirely-solvable bottleneck. YC's company directory updates in real time as Demo Day progresses. The Algolia-indexed search behind the YC site is publicly queryable. With 50 lines of Python you can pull the full active-batch roster in under 5 seconds, refresh every 90 seconds, and have a live feed during Demo Day itself.

This post is the working code, the join logic, and the prioritization framework. The NexGenData YC Companies Directory actor wraps this if you want a hosted version.

The YC Algolia Endpoint

YC's company list page (https://www.ycombinator.com/companies) is fully client-rendered. The page bundle includes a hardcoded Algolia application ID and a public read-only API key. Both are visible in the browser dev tools network tab.

import httpx

YC_ALGOLIA_URL = "https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query"
YC_ALGOLIA_HEADERS = {
    "X-Algolia-Application-Id": "45BWZJ1SGC",
    "X-Algolia-API-Key": "Y2VkOWQyMTJlYjZkZjE3MDRkY2YyNjBmYmIzMjVhMzA1ZmRlYTQ4OTUyZjEyZjRiNzc0OWQ4MjRmMzVlYmUxN3RhZ0ZpbHRlcnM9JTViJTIyJTVEJmZpbHRlcnM9aXNIaXJpbmclM0F0cnVl",
    "Content-Type": "application/json",
}

async def fetch_yc_batch(batch: str = "S26") -> list[dict]:
    """Fetch all companies in a specific YC batch."""
    payload = {
        "query": "",
        "hitsPerPage": 1000,
        "facetFilters": [[f"batch:{batch}"]],
    }
    async with httpx.AsyncClient(headers=YC_ALGOLIA_HEADERS, timeout=20) as client:
        r = await client.post(YC_ALGOLIA_URL, json=payload)
        r.raise_for_status()
        return r.json().get("hits", [])

Enter fullscreen mode Exit fullscreen mode

A response hit looks like:

{
  "name": "ExampleCo",
  "slug": "exampleco",
  "batch": "S26",
  "industry": "B2B",
  "subindustry": "DevTools",
  "team_size": 4,
  "regions": ["United States of America"],
  "isHiring": true,
  "stage": "Active",
  "tags": ["api", "developer-tools"],
  "description": "ExampleCo lets developers...",
  "website": "https://exampleco.com",
  "long_description": "ExampleCo is the missing layer between..."
}

Enter fullscreen mode Exit fullscreen mode

For a full active batch, expect 200-280 hits. Demo Day batches are gradually populated over the 90-day program — by Demo Day itself, all companies are publicly searchable.

Polling for Real-Time Updates

During Demo Day, YC's batch index updates in waves as companies present. To get a live feed:

import asyncio
from datetime import datetime

async def live_demo_day_tracker(batch: str, interval: int = 90):
    seen_slugs = set()
    while True:
        try:
            companies = await fetch_yc_batch(batch)
            new = [c for c in companies if c["slug"] not in seen_slugs]
            for c in new:
                print(f"[{datetime.now().isoformat()}] NEW: {c['name']} - {c['description'][:80]}")
                seen_slugs.add(c["slug"])
        except Exception as e:
            print(f"  poll error: {e}")
        await asyncio.sleep(interval)

Enter fullscreen mode Exit fullscreen mode

Polling every 90 seconds is gentle on YC's Algolia backend and keeps you within ~2 minutes of the actual update. Run it in a tmux session during Demo Day; pipe output to Slack via a webhook for team-wide visibility.

Triage Logic: From 250 Companies to 30 Worth Investigating

The hard part of Demo Day isn't ingest — it's prioritization. The naive approach (read all 250 descriptions) burns 3-4 hours and produces lukewarm shortlists. The better approach: pre-define your fund's thesis filters and score each company automatically.

A simple scoring model:

def score_yc_company(c: dict, thesis: dict) -> int:
    score = 0

    # Industry alignment (0-30 points)
    if c.get("industry") in thesis["target_industries"]:
        score += 30
    elif c.get("industry") in thesis["adjacent_industries"]:
        score += 15

    # Team size sweet spot (0-15 points)
    team = c.get("team_size", 0)
    if thesis["min_team"] <= team <= thesis["max_team"]:
        score += 15
    elif team < thesis["min_team"]:
        score += 5  # too early but not disqualifying

    # Hiring signal (0-10 points)
    if c.get("isHiring"):
        score += 10

    # Geography (0-10 points)
    regions = c.get("regions", [])
    if any(r in thesis["target_regions"] for r in regions):
        score += 10

    # Tag overlap (0-20 points, 5/tag up to 4)
    tag_overlap = set(c.get("tags", [])) & set(thesis["target_tags"])
    score += min(20, len(tag_overlap) * 5)

    # Description-based filter — keyword presence (0-15 points)
    desc = (c.get("description", "") + " " + c.get("long_description", "")).lower()
    keyword_hits = sum(1 for kw in thesis["target_keywords"] if kw in desc)
    score += min(15, keyword_hits * 5)

    return score

Enter fullscreen mode Exit fullscreen mode

Sample thesis config for a B2B-SaaS-focused pre-seed fund:

B2B_SAAS_PRESEED = {
    "target_industries": ["B2B"],
    "adjacent_industries": ["Fintech", "Healthcare"],
    "min_team": 2, "max_team": 8,
    "target_regions": ["United States of America", "Canada"],
    "target_tags": ["api", "developer-tools", "saas", "infrastructure",
                     "automation", "analytics", "data"],
    "target_keywords": ["api", "platform", "automation", "developer",
                        "dashboard", "analytics", "infrastructure"],
}

Enter fullscreen mode Exit fullscreen mode

Run all 250 companies through the scoring function, sort by score descending, and the top 30 are your day-1 outbound list. Top 80 is your day-2/3 follow-up list. The bottom 140 you ignore unless something specific surfaces in a peer-investor conversation.

This whole pipeline — fetch + score + sort — runs in under 8 seconds on a laptop. By contrast, manual triage of the same 250 companies takes 3-4 hours and is biased by reading order.

Cross-Referencing With External Signals

The real edge during Demo Day comes from cross-referencing YC's company data with external signals you've been tracking. Two sources that meaningfully sharpen the YC list:

LinkedIn founder signal. For each YC company, look up the founder LinkedIn profiles. Founders with prior senior IC roles at brand-name companies (FAANG, Stripe, Datadog, Snowflake, etc.) score 1.5-2x on conversion vs first-time founders without that pedigree. Auto-adding a "founder pedigree" multiplier pulls the right companies forward without manual triage.

Hacker News engagement. YC companies whose CEO has an HN account with >500 karma and recent post history are statistically more articulate, more likely to be making something engineers want to talk about, and more likely to convert on a thoughtful cold email. The NexGenData Hacker News Scraper actor pulls user metadata including karma and post counts.

Show HN history. A YC company whose founder previously launched a Show HN post (even a different project) is, statistically, in the top quartile of demo day quality. Show HN selects for builders. Pull this with the NexGenData Show HN Tracker actor.

The Cost of Not Automating This

Most VC sourcing teams I've worked with at series-A firms don't have a real demo day automation pipeline. They send 1-2 associates to the live event, take notes, and triage by hand over the following week. By the time their shortlist is ready, the top 30 companies have already had calls with 5-10 funds and are in active term-sheet negotiations.

The cost isn't the data — YC publishes everything for free. The cost is the speed delta. A team running this pipeline can triage on the day of demo day and book first calls within 48 hours. A team triaging by hand books first calls 5-10 days later, by which time the deal is set.

Cost of building it yourself: ~1 day of engineering, then ~$5/month of compute. Cost of using the YC Companies Directory actor: $0.01/company × ~270 companies/batch = ~$2.70/batch, runnable on demand.


NexGenData publishes 195+ actors covering startup-stage signals: YC alumni, Show HN, Product Hunt, Delaware C-corp formations, SEC Form D, and more. All pay-per-result.