Track YC Demo Day Companies in Real Time (with code)
Y Combinator Demo Day is the single most concentrated VC sourcing event of the year. Twice annually, ~250 companies present back-to-back over 1-2 days. Within 48 hours, the top 50 have term sheets. Within 7 days, the next 50 have term sheets. By day 14, the remaining 150 are either oversubscribed or starting to struggle.
The associate's job at a multi-stage fund during Demo Day is roughly:
- Within 6 hours: scrape every company's details from the YC site
- Within 12 hours: triage to the 30-50 worth investigating
- Within 24 hours: book first calls with the top 15
- Within 48 hours: close on the top 5
The bottleneck is step 1 — and it's an entirely-solvable bottleneck. YC's company directory updates in real time as Demo Day progresses. The Algolia-indexed search behind the YC site is publicly queryable. With 50 lines of Python you can pull the full active-batch roster in under 5 seconds, refresh every 90 seconds, and have a live feed during Demo Day itself.
This post is the working code, the join logic, and the prioritization framework. The NexGenData YC Companies Directory actor wraps this if you want a hosted version.
The YC Algolia Endpoint
YC's company list page (https://www.ycombinator.com/companies) is fully client-rendered. The page bundle includes a hardcoded Algolia application ID and a public read-only API key. Both are visible in the browser dev tools network tab.
import httpx
YC_ALGOLIA_URL = "https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query"
YC_ALGOLIA_HEADERS = {
"X-Algolia-Application-Id": "45BWZJ1SGC",
"X-Algolia-API-Key": "Y2VkOWQyMTJlYjZkZjE3MDRkY2YyNjBmYmIzMjVhMzA1ZmRlYTQ4OTUyZjEyZjRiNzc0OWQ4MjRmMzVlYmUxN3RhZ0ZpbHRlcnM9JTViJTIyJTVEJmZpbHRlcnM9aXNIaXJpbmclM0F0cnVl",
"Content-Type": "application/json",
}
async def fetch_yc_batch(batch: str = "S26") -> list[dict]:
"""Fetch all companies in a specific YC batch."""
payload = {
"query": "",
"hitsPerPage": 1000,
"facetFilters": [[f"batch:{batch}"]],
}
async with httpx.AsyncClient(headers=YC_ALGOLIA_HEADERS, timeout=20) as client:
r = await client.post(YC_ALGOLIA_URL, json=payload)
r.raise_for_status()
return r.json().get("hits", [])
A response hit looks like:
{
"name": "ExampleCo",
"slug": "exampleco",
"batch": "S26",
"industry": "B2B",
"subindustry": "DevTools",
"team_size": 4,
"regions": ["United States of America"],
"isHiring": true,
"stage": "Active",
"tags": ["api", "developer-tools"],
"description": "ExampleCo lets developers...",
"website": "https://exampleco.com",
"long_description": "ExampleCo is the missing layer between..."
}
For a full active batch, expect 200-280 hits. Demo Day batches are gradually populated over the 90-day program — by Demo Day itself, all companies are publicly searchable.
Polling for Real-Time Updates
During Demo Day, YC's batch index updates in waves as companies present. To get a live feed:
import asyncio
from datetime import datetime
async def live_demo_day_tracker(batch: str, interval: int = 90):
seen_slugs = set()
while True:
try:
companies = await fetch_yc_batch(batch)
new = [c for c in companies if c["slug"] not in seen_slugs]
for c in new:
print(f"[{datetime.now().isoformat()}] NEW: {c['name']} - {c['description'][:80]}")
seen_slugs.add(c["slug"])
except Exception as e:
print(f" poll error: {e}")
await asyncio.sleep(interval)
Polling every 90 seconds is gentle on YC's Algolia backend and keeps you within ~2 minutes of the actual update. Run it in a tmux session during Demo Day; pipe output to Slack via a webhook for team-wide visibility.
Triage Logic: From 250 Companies to 30 Worth Investigating
The hard part of Demo Day isn't ingest — it's prioritization. The naive approach (read all 250 descriptions) burns 3-4 hours and produces lukewarm shortlists. The better approach: pre-define your fund's thesis filters and score each company automatically.
A simple scoring model:
def score_yc_company(c: dict, thesis: dict) -> int:
score = 0
# Industry alignment (0-30 points)
if c.get("industry") in thesis["target_industries"]:
score += 30
elif c.get("industry") in thesis["adjacent_industries"]:
score += 15
# Team size sweet spot (0-15 points)
team = c.get("team_size", 0)
if thesis["min_team"] <= team <= thesis["max_team"]:
score += 15
elif team < thesis["min_team"]:
score += 5 # too early but not disqualifying
# Hiring signal (0-10 points)
if c.get("isHiring"):
score += 10
# Geography (0-10 points)
regions = c.get("regions", [])
if any(r in thesis["target_regions"] for r in regions):
score += 10
# Tag overlap (0-20 points, 5/tag up to 4)
tag_overlap = set(c.get("tags", [])) & set(thesis["target_tags"])
score += min(20, len(tag_overlap) * 5)
# Description-based filter — keyword presence (0-15 points)
desc = (c.get("description", "") + " " + c.get("long_description", "")).lower()
keyword_hits = sum(1 for kw in thesis["target_keywords"] if kw in desc)
score += min(15, keyword_hits * 5)
return score
Sample thesis config for a B2B-SaaS-focused pre-seed fund:
B2B_SAAS_PRESEED = {
"target_industries": ["B2B"],
"adjacent_industries": ["Fintech", "Healthcare"],
"min_team": 2, "max_team": 8,
"target_regions": ["United States of America", "Canada"],
"target_tags": ["api", "developer-tools", "saas", "infrastructure",
"automation", "analytics", "data"],
"target_keywords": ["api", "platform", "automation", "developer",
"dashboard", "analytics", "infrastructure"],
}
Run all 250 companies through the scoring function, sort by score descending, and the top 30 are your day-1 outbound list. Top 80 is your day-2/3 follow-up list. The bottom 140 you ignore unless something specific surfaces in a peer-investor conversation.
This whole pipeline — fetch + score + sort — runs in under 8 seconds on a laptop. By contrast, manual triage of the same 250 companies takes 3-4 hours and is biased by reading order.
Cross-Referencing With External Signals
The real edge during Demo Day comes from cross-referencing YC's company data with external signals you've been tracking. Two sources that meaningfully sharpen the YC list:
LinkedIn founder signal. For each YC company, look up the founder LinkedIn profiles. Founders with prior senior IC roles at brand-name companies (FAANG, Stripe, Datadog, Snowflake, etc.) score 1.5-2x on conversion vs first-time founders without that pedigree. Auto-adding a "founder pedigree" multiplier pulls the right companies forward without manual triage.
Hacker News engagement. YC companies whose CEO has an HN account with >500 karma and recent post history are statistically more articulate, more likely to be making something engineers want to talk about, and more likely to convert on a thoughtful cold email. The NexGenData Hacker News Scraper actor pulls user metadata including karma and post counts.
Show HN history. A YC company whose founder previously launched a Show HN post (even a different project) is, statistically, in the top quartile of demo day quality. Show HN selects for builders. Pull this with the NexGenData Show HN Tracker actor.
The Cost of Not Automating This
Most VC sourcing teams I've worked with at series-A firms don't have a real demo day automation pipeline. They send 1-2 associates to the live event, take notes, and triage by hand over the following week. By the time their shortlist is ready, the top 30 companies have already had calls with 5-10 funds and are in active term-sheet negotiations.
The cost isn't the data — YC publishes everything for free. The cost is the speed delta. A team running this pipeline can triage on the day of demo day and book first calls within 48 hours. A team triaging by hand books first calls 5-10 days later, by which time the deal is set.
Cost of building it yourself: ~1 day of engineering, then ~$5/month of compute. Cost of using the YC Companies Directory actor: $0.01/company × ~270 companies/batch = ~$2.70/batch, runnable on demand.
NexGenData publishes 195+ actors covering startup-stage signals: YC alumni, Show HN, Product Hunt, Delaware C-corp formations, SEC Form D, and more. All pay-per-result.

























