I recorded Polymarket's 5-minute crypto markets for two months. Here's the dataset.

Hacker News: Show HN

PurrrrrFocus: Pomodoro Timer App - App Store Workflow Engine — Multi-Step Orchestration for Bun RapidPhoto: Pro Photo Editor App - App Store GitHub - DheerG/swarms: Achieve extraordinary results with claude code across a variety of tasks SPICE simulation → oscilloscope → verification with Claude Code — Lucas Gerads Show HN: VCoding – A 5 MB native Windows IDE with no dynamic dependencies Show HN: LLMs don't hallucinate because they're bad at math, it's the format GitHub - Agent-FM/agentfm-core: AgentFM is a peer-to-peer network that turns everyday computers into a decentralized AI supercomputer. AgentFM lets you run massive AI workloads directly across a global mesh of idle CPUs and GPUs. Show HN: Tracking Top US Science Olympiad Alumni over Last 25 Years GitHub - Potarix/agent-hub: One place to talk to all your agents Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) GitHub - dubeyKartikay/lazyspotify: Terminal Spotify client for macOS and Linux GitHub - the-banana-tool/king-louie: Easy to use GUI Personal AI Assistant. Win/Linux/Mac. Show HN I made my vacation rental bookable by AI agents–no Airbnb, 0% commission GitHub - basteez/jsf-autoreload: maven plugin to enable hot reload on jsf projects uvm32/hosts/host-gdbstub at main · ringtailsoftware/uvm32 GitHub - labsai/EDDI: Config-driven engine that turns JSON into production-grade AI agents. Multi-agent orchestration, 12+ LLM providers, MCP/A2A protocols, RAG, persistent memory, and enterprise compliance (EU AI Act, GDPR, HIPAA). Built on Quarkus. GitHub - glitchnsec/fortyone-oss: AI Executive Assistant Platform Quickstart | Alien GitHub - muxshed/shed: One stream in, or many. Every destination, simultaneously. No cloud middleman, no per-channel fees, no limits. GitHub - ocrbase-hq/ocrbase: 📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable. GitHub - impactjo/home-memory: MCP server that lets your AI assistant remember everything about your home. GitHub - Sets88/dbcls: DbCls is a powerful terminal database client that supports various databases GitHub - neptun2000/heor-agent-mcp GitHub - SeanFDZ/macmind: Single-layer transformer in HyperTalk for the classic Macintosh RollQuation: Math Puzzles - Apps on Google Play GitHub - dropbox/witchcraft Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis GitHub - opentalon/opentalon: OpenTalon is an open-source platform built from the ground up in Go as a robust alternative to OpenClaw LinkedIn™ 职位抓取工具 - Chrome 应用商店 GitHub - EdoardoBambini/Agent-Armor-Iaga: AI agents are getting tool access — shell, file system, databases, APIs, secrets. But **nobody is governing what they actually do with it**. Frameworks like LangChain, CrewAI, AutoGen, and Claude Code give agents the power to execute. Agent Armor gives you the power to control, audit, and approve every single action before it happens. HN Vibes — Week 15, Apr 7–13 2026 GitHub - chojs23/ec: Easy terminal-native 3-way git mergetool vim-like workflow GitHub - SethPyle376/hiraeth: Local AWS emulator focused on fast integration testing, with SQS support, SQLite-backed state, and a debug-friendly web UI. GitHub - JakOb-dotcom/cloud-sandbox-security-analysis: Technical analysis and Proof of Concept (PoC) regarding environment variable exfiltration in containerized cloud sandboxes via side-channel data leaks. Springboards - Flint Alpha Show HN: A simpler coding agent harness GitHub - audiodude/sudomake-friends GitHub - 256thFission/mini-mythos: OSS clone of Anthropic’s Mythos harness to locate C/C++ memory vulnerabilities Show HN: OpenParallax: OS-level privilege separation for AI agent execution Hacker News Sorted - Chrome 应用商店 Show HN: How to Install Docker on Ubuntu 24.04 LTS: Complete 2026 Guide GitHub - himanshudongre/smriti GitHub - sverrirsig/claude-control: macOS desktop dashboard for monitoring and managing multiple Claude Code sessions GitHub - ory/dockertest: Write better integration tests! Dockertest helps you boot up ephermal docker images for your Go tests with minimal work. Chiral - Chrome 应用商店 Show HN: Two Claudes collaborating through shared memory on a $100 mini-PC GitHub - pmichaillat/latex-cv: Minimalist LaTeX template for academic CVs GitHub - oguzbilgic/posse: A web UI for Anthropic Managed Agents. GitHub - sshiraz/depsly: Dependency risk analysis tool for npm packages ABI Add safari/agent-harness — Safari browser automation via safari-mcp by achiya-automation · Pull Request #212 · HKUDS/CLI-Anything GitHub - Halfblood-Prince/trustcheck: Verify PyPI package attestations and improve Python supply-chain security GitHub - oguzbilgic/kern-ai: Agents that do the work and show it. GitHub - bruits/satteri: High-performance Markdown and MDX processing for the JavaScript ecosystem GitHub - tylergibbs1/feedstock: High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright GitHub - Grimm67123/grimmbot: The self-improving sandboxed and open-source AI agent. With persistent memory and scheduling. GitHub - whitevanillaskies/whitebloom: Local whiteboard that blooms. GitHub - hwdsl2/docker-whisper: Docker image for a self-hosted Whisper speech-to-text server with speaker diarization and OpenAI-compatible transcription and translation APIs. Powered by faster-whisper. Supports all Whisper models, NVIDIA GPU (CUDA) acceleration, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64). GitHub - yisding/reviewwiggum GitHub - MarwanAlsoltany/serrors: Structured errors for Go: sentinel hierarchies, typed data, custom formatting, and slog integration. GitHub - soatok/age-php GitHub - Luthiraa/markitme GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits GitHub - tombedor/excalicharts GitHub - wh1le/excalidraw-edit: Open and edit .excalidraw files from the terminal. Offline, auto-saves to disk. MalExt Sentry - Malicious Extension Scanner - Chrome 应用商店 GitHub - syi0808/asciianimesvg: Generate animated ASCII art SVGs from text. CLI, Rust library, WASM, and web editor. GitHub - zaina-ml/ml_forge: A visual-based graph node editor for training computer vision models. GitHub - anakin87/llm-rl-environments-lil-course: 🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models GitHub - takaakit/superpowers-uml: Superpowers-UML modifies Superpowers to ensure a software development workflow in which AI agents design through UML modeling. AdriByte Studio - Sviluppo Web e Soluzioni Digitali GitHub - chouligi/angel-copilot: Your personalized Angel Investment Advisor Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 GitHub - agenteractai/lodmem: Level Of Detail Context Management for Agents GitHub - ostefani/subnetlens: A fast, concurrent network scanner with a TUI and plain-text CLI, built in Go. It discovers live hosts on your network, scans their open ports, resolves hostnames, and fingerprints operating systems—delivered. Cyber Pulse: Agentic Intel - Apps on Google Play Whisper API: Self-Hostable Speech to Text Transcription The Agent-Web Protocol Stack: A Research Thesis GitHub - msmarkgu/RelayFreeLLM: A restful API designed to route user prompts to various AI model providers. Show HN: Provepy – A Python decorator that proves your code using Lean and LLMs Show HN: Pardonned.com – A searchable database of US Pardons GitHub - patrickdappollonio/dux: Dux is a terminal UI that lets you run multiple AI coding agents side by side, each in its own git worktree, with full companion terminals, macros, commit generation, and a command palette that knows more tricks than you do. kMC Crystal Simulator Show HN: HyperFlow – A self-improving agent framework built on LangGraph GitHub - stef41/vibescore: 🎵 Grade your vibe-coded project. One command, instant letter grade across security, quality, dependencies, and testing. GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. imgur.com GitHub - visionscaper/collabmem: Enabling long-term collaboration with Agentic AI - building up episodic and world model memory over time with in-context awareness 在 Steam 上购买 FriedrichAI: Offline AI 立省 10% GitHub - atripati/ark: AI Runtime Kernel — a context operating system for AI agents. Eliminates tool bloat, loads only what’s needed, and gives LLMs their reasoning space back. GitHub - nowork-studio/toprank: Open-source Claude Code skills for SEO, SEM, Google Ads GitHub - tacomanator/sash: Lightweight macOS menu bar app for reliably cycling through windows of the current application. Appents | Social Media Management for Product-First Teams GitHub - pnhoang/youtube-spam-blocker: Automatically detects and hides spam messages in YouTube Live chat. Set rate limits, keyword filters, and block repeat offenders. GitHub - decisionnode/DecisionNode: CLI + Local MCP - A shared structured memory store across Claude Code, Cursor, Windsurf, Antigravity, and every MCP client. Semantically queryable. GitHub - AvaCodeSolutions/django-email-learning: An open source Django app for creating email-based learning platforms with IMAP integration and React frontend components. The $100K Gap in Kubernetes Security Tooling Function Calling Harness: From 6.75% to 100%

Kacho · 2026-06-18 · via Hacker News: Show HN

You're most likely familiar with Polymarket's 5-minute crypto markets. If not, here's the gist — Polymarket runs a market on whether Bitcoin will be higher or lower five minutes from now. Then another. Then another. 24/7, one every five minutes, for seven different coins. As far as I know, there's no freely available history of those markets anywhere. Polymarket will hand you the live order book, but the moment a window closes it's gone — and you can't backtest a bot on data that doesn't exist.

So I recorded my own. For about seven weeks I captured the order book of every one of these markets, once per second, for BTC, ETH, SOL, XRP, DOGE, HYPE and BNB. This post is me giving that away — the data, the schema, how it was collected, and exactly where it's thin. It's free, and you can do whatever you like with it.

Fair warning: most of this post from here on is AI-generated — written from a description and analysis of the data I fed it. It's still an accurate, thorough account of what's in the dataset. And one caveat on the data itself: it's only two months, and not as granular as you'd probably need to build a genuinely competitive bot for these markets. But it's something to start with, and it's yours — no strings attached.

My two cents on the data: it's good enough to backtest any bot you build, but don't expect the live results to match. I evaluated my own BTC bot and it showed a respectable 3–5% ROI after fees — and then running it live cost me roughly $600. I'd made some mistakes early that I later fixed, but even then the fees quietly ate whatever ROI I was getting close to. That's a longer story, and a full write-up for another day.

Btw, if you can't be bothered with the whole write-up and you just want the data, you can jump straight to it.

The headline numbers

7 coins — BTC, ETH, SOL, XRP, DOGE, HYPE, BNB.
~89,000 markets — each a single 5-minute up/down window that opened, traded and resolved.
~26.8 million per-second observations — every market sampled once a second for its full five-minute life (≈300 ticks each).
Span: BTC from 24 Mar 2026, the other six from 5 Apr 2026, all running to 18 May 2026 — all timestamps UTC.
Coverage 99.8%+ with no duplicates. It's a fixed historical window, not a live feed.

What these markets actually are

Skip this part if you already know the 5/15/60-minute BTC (or ETH, SOL, XRP, BNB, DOGE, HYPE) markets.

Each market asks one question: will this coin's price be up or down at the end of a fixed 5-minute window? Two outcomes, "Up" and "Down", each trading as its own token priced between 0 and 1 in USDC. That price is the market's implied probability — an Up token at 0.62 means the market thinks there's a 62% chance the coin closes the window higher.

Because Up and Down are two separate order books with their own spreads, the two best bids won't add up to exactly 1. That gap is the spread, and any persistent drift away from 1 is itself worth looking at.

Data dictionary

The data comes as two tables per coin, joined on condition_id: a markets table (one row per resolved 5-minute market) and a ticks table (one row per second — the order book). So btc_markets.parquet has ~15,700 rows; btc_ticks.parquet has ~4.7 million.

`markets` — one row per 5-minute market

Column	Type	Unit	Meaning
`condition_id`	text	—	Polymarket's on-chain condition ID. Unique — the join key to `ticks`.
`event_id`	text	—	Polymarket Gamma event ID.
`slug`	text	—	Human-readable market slug, e.g. `btc-updown-5m-1774745100`.
`market_start`	timestamptz	UTC	When the 5-minute window opens. Always aligned to a 5-min boundary (`:00`, `:05`, `:10`…).
`market_end`	timestamptz	UTC	When the market resolves — `market_start` + 5 min.
`recorded_at`	timestamptz	UTC	When the row was written, just after `market_end`.
`token_up`	text	—	ERC-1155 token ID for the "Up" outcome.
`token_down`	text	—	ERC-1155 token ID for the "Down" outcome.
`volume`	numeric	USDC	Market volume from the Gamma API at discovery time.
`liquidity`	numeric	USDC	Market liquidity from the Gamma API at discovery time.
`outcome`	text	—	`'Up'`, `'Down'`, or `NULL`. Inferred from the final tick (winning side's bid → ~0.99), not read from on-chain resolution.
`n_ticks`	int	—	Number of per-second rows for this market in `ticks` (≈300).

`ticks` — one row per second

Each row is one 1-second sample of the live book, joined to markets by condition_id. Up and Down are separate books, so every tick captures both sides.

Column	Type	Unit	Meaning
`condition_id`	text	—	Join key back to `markets`.
`t`	bigint	seconds	Sample time, unix epoch seconds (UTC).
`ts_utc`	timestamptz	UTC	Same instant as `t`, as an ISO timestamp.
`bu` / `au`	numeric	USDC (0–1)	Best bid / ask, Up token.
`bd` / `ad`	numeric	USDC (0–1)	Best bid / ask, Down token.
`su` / `sd`	numeric	shares	Size resting at best bid, Up / Down. `NaN` if that side was empty.
`sau` / `sad`	numeric	shares	Size resting at best ask, Up / Down. `NaN` if that side was empty.
`du` / `dd`	numeric	USDC	Depth — Σ(size × price) for all bids within 5¢ of best bid, Up / Down.

A natural point estimate for the implied probability of "Up" is the mid, (bu + au) / 2. Within a market the t values run from ≈market_start to ≈market_start + 299s — a clean 300-second sweep, one tick per second. The only NaNs in the whole dataset are in the resting-size columns, where a side of the book happened to be empty.

Coverage and gaps

I'd rather you know where this is thin before you build on it.

Coin	Markets	Span (UTC)	5-min coverage	Missing windows
BTC	15,682	24 Mar → 18 May	99.88%	19
ETH	12,258	05 Apr → 18 May	99.84%	20
SOL	12,259	05 Apr → 18 May	99.85%	19
XRP	12,258	05 Apr → 18 May	99.84%	20
DOGE	12,259	05 Apr → 18 May	99.85%	19
HYPE	12,258	05 Apr → 18 May	99.84%	20
BNB	12,259	05 Apr → 18 May	99.85%	19

The missing windows aren't random per-coin loss — they line up on the same wall-clock times across all seven coins, which means they're brief outages of my collector, not anything wrong with a specific market. The entire gap inventory over 7.5 weeks:

18 Apr, 10:35 → 11:55 UTC — ~15 windows (~1.2h), the big one.
15 Apr, 12:00 → 12:15 UTC — 2 windows.
16 Apr, ~22:50 and ~23:30 UTC — 1–2 windows each.

Inside the markets that are present, the per-second data is excellent: ~99.97% of markets carry the full ~300 ticks (the exact count is in each market's n_ticks), and no market has internal gaps — its seconds are contiguous start to finish. A handful of short markets exist per coin (the odd ~230- or ~293-tick one, again from the same shared hiccups). No empty markets anywhere, and condition_id is unique, so every market appears exactly once — no duplicates.

Known limitations

It's a fixed window, not live. Collection ran to 18 May 2026 and stops there. Treat it as history.
BTC has ~12 extra days of history than the other six — it took me a while to realise the other coins were worth recording too.
outcome is inferred, not read from chain. It comes from the final tick's bid and can be NULL near edge cases — treat it as best-effort.
volume / liquidity are point-in-time values from discovery, not end-of-window figures.
Ask-side depth isn't recorded — only best-ask price and size. The 5¢ depth aggregate (du/dd) is bid-side only.
bu + bd ≠ 1 in general — two independent books, two spreads.
Sampling is best-effort 1 Hz from a cache; a tick is written only when at least one side had book data, which is why a few markets fall short of 300 ticks and why the resting-size columns are NaN where a side was empty.

How it was collected

A custom recorder subscribes to Polymarket's public CLOB WebSocket order-book feed and keeps an in-memory book for every active market's Up and Down tokens. It discovers new markets via the Gamma API every 30 seconds. Once per second it reads the cached top-of-book for each active market and appends a sample — zero network calls per sample, just a read of the live cache — and when a market's window closes, all of its per-second ticks are written out. Everything is UTC.

Get the data

Hugging Face: https://huggingface.co/datasets/kachoio/polymarket-5-minute-crypto-up-down-markets
Kaggle: https://www.kaggle.com/datasets/kachoio/polymarket-5-minute-crypto-updown-markets

Each coin ships as two files — <coin>_markets.parquet and <coin>_ticks.parquet — joined on condition_id. Parquet is the primary download (the whole set is ~725MB). Quick start in pandas:

import pandas as pd
 
markets = pd.read_parquet("btc_markets.parquet")   # one row per market
ticks   = pd.read_parquet("btc_ticks.parquet")     # one row per second
 
ticks["mid_up"] = (ticks["bu"] + ticks["au"]) / 2  # implied P(Up)
 
# the per-second history of a single market
first = markets.iloc[0]["condition_id"]
one = ticks[ticks["condition_id"] == first].sort_values("t")
print(one[["ts_utc", "bu", "au", "mid_up", "du"]].head())

Licence and terms

Released under CC0 1.0 — public domain. No attribution required, no strings. Use it for anything. (A link back to my website or a share on social media is appreciated, never required.)

This dataset is derived from public market data on Polymarket — specifically the order books of its 5-minute crypto up/down markets — captured via Polymarket's public CLOB WebSocket and Gamma API. It's an independent, transformed recording (per-second top-of-book aggregates), not affiliated with or endorsed by Polymarket, and provided "as is" for research and educational use, with no warranty.

If you do something interesting with it, I'd genuinely like to see it. And if you want the next drop — I'm restarting the recorder, and the bot that this was all for gets its own write-up — subscribe and I'll send it over.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hacker News: Show HN

The headline numbers

What these markets actually are

Data dictionary

markets — one row per 5-minute market

ticks — one row per second

Coverage and gaps

Known limitations

How it was collected

Get the data

Licence and terms

`markets` — one row per 5-minute market

`ticks` — one row per second