惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

V2EX - 技术
V2EX - 技术
L
LangChain Blog
IT之家
IT之家
S
SegmentFault 最新的问题
博客园 - 三生石上(FineUI控件)
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
Blog — PlanetScale
Blog — PlanetScale
N
Netflix TechBlog - Medium
U
Unit 42
B
Blog RSS Feed
GbyAI
GbyAI
Microsoft Security Blog
Microsoft Security Blog
博客园 - 司徒正美
Apple Machine Learning Research
Apple Machine Learning Research
T
Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
The Register - Security
The Register - Security
Vercel News
Vercel News
S
Schneier on Security
Spread Privacy
Spread Privacy
C
Cyber Attacks, Cyber Crime and Cyber Security
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
博客园 - 叶小钗
雷峰网
雷峰网
博客园_首页
人人都是产品经理
人人都是产品经理
P
Palo Alto Networks Blog
The Hacker News
The Hacker News
T
Tor Project blog
L
Lohrmann on Cybersecurity
Know Your Adversary
Know Your Adversary
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
Cybersecurity and Infrastructure Security Agency CISA
P
Privacy International News Feed
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tenable Blog
V
Vulnerabilities – Threatpost
大猫的无限游戏
大猫的无限游戏
博客园 - 【当耐特】
V
V2EX
Security Latest
Security Latest
A
About on SuperTechFans
Cloudbric
Cloudbric
S
Security Affairs
MongoDB | Blog
MongoDB | Blog
Y
Y Combinator Blog
Martin Fowler
Martin Fowler
TaoSecurity Blog
TaoSecurity Blog

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
phpser: a fast, secure binary serializer for PHP cache workloads
Ilia Alshanetsky · 2026-06-03 · via DEV Community

I've reached for igbinary on nearly every PHP project I've shipped in the last decade. It's smaller and faster than PHP's native serialize(), it's stable, and it has been the obvious default for so long that reaching for it stopped being a decision.

So phpser started as curiosity, not a complaint. igbinary is good. Could a serializer built specifically for cache workloads do better?

I wanted two things from it. It should be fast on the shapes a cache actually holds, where a value is decoded far more often than it's encoded. And it should be safe to decode bytes from a store an attacker might reach, because unserialize() on untrusted input is one of PHP's oldest exploit primitives. igbinary gives you the speed; the safety you bolt on yourself. phpser builds in both.

On the shapes that matter for caches it encodes 10 to 70% faster than igbinary and decodes 12 to 75% faster, with packed numeric data also 65% smaller on the wire. Its signed mode refuses to decode any payload that wasn't produced with your key, so a poisoned cache entry never reaches the code that builds objects. The rest of this post is how it gets both.

Why a serializer built for caches?

Because igbinary optimizes for the general case, and a cache is not the general case, on two axes.

The first is the read/write asymmetry. A PHP cache pays decode cost on every single read. Encode happens once, when you write the value; decode happens every time anything reads it back. For a read-heavy cache that ratio is easily 100 to 1. igbinary, like most general serializers, balances the two sides. A cache serializer shouldn't.

Encode runs once per write; decode runs on every read

The second is trust. The thing reading those bytes back is often reading from redis, memcached, a file, or a cookie, any of which an attacker may be able to write to. A general serializer treats decode as a pure data operation. A cache serializer has to treat it as a trust boundary.

igbinary is still the right default for general use. I went looking for the specific shapes where a cache-focused design could pull ahead, and there are three that show up everywhere in real PHP backends:

  • Packed numeric arrays. range(0, 999), ID lists, analytics buckets, sensor readings.
  • Deep-nested structures. Trees, recursive config, nested document structures.
  • Same-class object batches. Laravel queue payloads, cached Eloquent models, any array of a few hundred identical-shape DTOs.

Designing the format for the reader

The performance half of phpser borrows an instinct from Rust's rkyv. rkyv's pitch is that deserialization should be nearly free, because the writer already laid the bytes out the way the reader needs them. You don't parse an rkyv archive so much as point at it.

phpser isn't zero-copy, and I want to be precise about that before the comparison runs away. PHP values are refcounted zvals with owned hashtables; you can't hand PHP a pointer into a cache buffer and call it an array. phpser does a real decode pass and builds real zvals. It's not rkyv.

What transferred is the instinct, not the mechanism. rkyv made me stop thinking about the wire format as a neutral container and start thinking about it as a set of instructions to the reader. If the writer knows something that saves the reader work, the writer should record it, even when that makes encoding a little more complex. Once you adopt that lens, a set of concrete decisions falls out of it.

A string dictionary, and an intern that survives decode

The honest starting point: a front-loaded string dictionary isn't novel. igbinary already does this, it calls them compact_strings. Both serializers emit each distinct string once and reference it afterward, so the property name "created_at" repeated across a thousand cached rows costs one copy, not a thousand.

The dictionary isn't where the win is. The win is on the decode side, and it's the most direct application of the design-for-the-reader rule.

When phpser decodes a dictionary string the first time, it allocates a zend_string. Every later reference to that same dictionary index doesn't allocate; it bumps the refcount on the one already built. A thousand rows that all carry the key "user_id" produce exactly one string allocation and 999 refcount increments. PHP's own machinery is built for exactly this, interned strings are shared by refcount throughout the engine, so phpser isn't fighting the runtime, it's leaning on it.

The dictionary is emitted once at the head; values reference it by varint index, and repeated strings reuse one interned zend_string by refcount

Fast to encode, too

Designing for the reader could have meant a slow writer. It doesn't, because of two encoder choices, and the result is that phpser encodes faster than igbinary on every shape I test.

The first is the intern cache. phpser keeps an open-addressed zend_string*-to-slot hash, grown without eviction. Before hashing a string's bytes, it checks pointer identity: PHP interns string literals, so the "id" in row 1 and the "id" in row 900 are usually the same pointer and resolve with no byte work at all. Just as important, a unique value string, a name, an email, a SKU, takes a single-probe miss instead of a linear scan. The per-value dedup lookup stays off the critical path even on payloads full of strings that never repeat.

The second is objects. Encoding a PHP object the obvious way calls get_properties, which materializes a properties hashtable even for a plain object whose layout is fixed and known. For a batch of a few hundred DTOs that's hundreds of throwaway hashtables. phpser serializes a plain object straight from its declared property slots and skips the hashtable, the way native serialize() does. PHP 8.4 lazy objects fall back to get_properties, because their initializer has to run first.

Tagged scalar runs, and building the array in place

Two more decisions, on the decode side, are what make the packed-numeric numbers as large as they are.

The first is tagged scalar runs. igbinary encodes [1, 2, 3, ...] as a sequence of tagged values: a type tag and a varint, a thousand times over. phpser detects a uniform run and emits one PACKED_LONGS header plus the thousand integers as raw zigzag varints, no per-element tag. Decode becomes one tight loop with zero tag dispatch.

The second is building the hashtable in place. When the wire format says PACKED_LONGS of length N, the decoder knows the final size before it reads a single element. So it allocates the array once with zend_new_array(N) and writes the values directly into PHP 8's packed arPacked storage with ZVAL_* macros. That skips N calls to zend_hash_next_index_insert, and with them N hash computations, N capacity checks, and the incremental table growth that a naive decoder pays as it discovers the array's size one element at a time. The writer recorded the size so the reader could allocate once and fill, which is the rkyv instinct applied as far as a non-zero-copy format can take it.

A naive decoder hashes and grows the table per element; phpser allocates once from the header count and writes slots directly

The benchmarks

Here is the full shape-by-shape comparison, run on my machine, against igbinary. The bench harness (bench.php in the repo) round-trips every shape for correctness first, then times encode and decode separately, because decode is the number that matters for a cache.

Methodology: phpser 0.1.2, PHP 8.4.22-dev NTS, release build (not a debug or ASan build, which would inflate everything 2 to 5x), igbinary 3.2.17RC1, Intel Core i9-13950HX. 1,000 iterations per shape, median of 9 runs after a discarded warm-up.

Shape Size: igbinary → phpser Encode: igbinary → phpser Decode: igbinary → phpser
packed_1k 5,495 → 1,941 B (-65%) 4.6 → 1.4 µs (-70%) 7.3 → 1.8 µs (-75%)
packed_10k 59,495 → 21,749 B (-63%) 46.4 → 13.7 µs (-70%) 74.0 → 18.9 µs (-74%)
deep_50 419 → 424 B (+1%) 1.3 → 0.62 µs (-54%) 1.8 → 1.6 µs (-15%)
dto_100 7,083 → 6,362 B (-10%) 15.5 → 13.9 µs (-10%) 26.9 → 23.5 µs (-13%)
dto_1000 73,372 → 64,863 B (-12%) 194 → 165 µs (-15%) 275 → 227 µs (-18%)
rowset_100 4,570 → 4,771 B (+4%) 10.0 → 7.3 µs (-27%) 10.7 → 10.8 µs (+1%)
rowset_1000 47,459 → 47,972 B (+1%) 157 → 71 µs (-55%) 104 → 107 µs (+4%)
dto_mixed 21,644 → 17,927 B (-17%) 58.8 → 39.8 µs (-32%) 112 → 81 µs (-28%)

The packed rows are the ones that jump out: roughly two-thirds smaller and three-quarters faster to decode, on a real shape, not a synthetic micro-case. packed_1k is range(0, 999), which is what an ID list or an analytics bucket looks like.

The DTO rows are the relatable ones. dto_1000 is a thousand small typed objects of one class, the shape a Laravel queue batch or a page of cached models actually has. 12% smaller, 18% faster to decode, 15% faster to encode, from the dictionary dedup on property names and a class-entry lookup cache that amortizes zend_lookup_class_ex across the batch. Encode is faster than igbinary on every row; the largest margins are on the object-heavy dto_mixed (32% faster, 17% smaller) and the mixed rowset_1000 (55% faster).

Where it gives a little back

The one row where phpser loses is the mixed associative rowset. rowset_1000 decodes about 4% slower than igbinary, and the rowset payloads run 1 to 4% larger. That's the front-loaded dictionary showing its one downside: the decoder walks the dictionary header before it touches values, and on a heterogeneous rowset with few repeated strings that header walk doesn't buy back its cost. It's a small tax, and it's on the exact axis I chose to de-prioritize, but it's real and measured, so there it is.

The structural limit is the same decision seen from another angle: phpser isn't streamable. The dictionary lives at the head of the payload and values reference it by index, so you can't decode the stream incrementally as it arrives. The front-loaded dictionary is what makes the other decodes fast and what makes streaming impossible. You don't get to keep both. If you need a streaming parser, this is the wrong format.

I also cross-checked the whole suite on arm64 to make sure none of this was an x86 quirk. Same direction on every shape, with narrower encode margins on the object cases. The decode wins and the single rowset_1000 decode tax both reproduce.

Signed payloads: safe to decode from an untrusted cache

The performance half is only one reason a cache serializer is its own problem. The other is that decoding attacker-controlled bytes is dangerous. Native unserialize() on untrusted input lets a crafted payload instantiate any allowed class and drive its __wakeup, __destruct, or other magic methods into a state the code never anticipated. That's the mechanism behind object-injection and gadget-chain attacks, and a cache is the soft spot: a redis instance, a memcached pool, a file cache, or a cookie is exactly the kind of store an attacker reaches in a real incident, and whatever sits there gets decoded on the next read.

phpser's answer is a signed mode built on HMAC-SHA256. You serialize with a secret key, and you refuse to decode anything that wasn't signed with that same key. Verification is constant-time and runs before any decoding work, so a tampered or foreign-keyed payload never reaches the part of the decoder that builds values or constructs objects.

Native unserialize decodes attacker bytes before checking anything; the signed path verifies the HMAC first and returns null on mismatch, so nothing is decoded

$key = random_bytes(32);            // generate once, keep it in app config or a secrets manager

// on write
$blob = phpser_serialize_signed($cacheValue, $key);
$redis->set('user:42', $blob);

// on read
$blob  = $redis->get('user:42');
$value = phpser_unserialize_signed($blob, $key);

if ($value === null) {
    // tampered, truncated, or signed with a different key.
    // nothing was decoded; treat it as a cache miss and rebuild.
    $value = rebuild_user(42);
}

Enter fullscreen mode Exit fullscreen mode

The contract is deliberately blunt. phpser_unserialize_signed() returns null on any signature failure rather than throwing, so a poisoned cache entry degrades to a miss instead of an exception in a hot read path. The decode only proceeds once the MAC matches. This is authentication, not encryption: the bytes are still readable, but they can't be forged without the key, and that's the property that keeps a crafted object graph out of your decoder.

Signed mode also refuses an empty key. An empty key would reduce HMAC-SHA256 to a fixed, keyless tag anyone can recompute, so a caller writing phpser_serialize_signed($v, getenv('SECRET') ?: '') with the variable unset would be shipping forgeable payloads without knowing it. Both signed entry points throw on an empty key before doing any work, so that misconfiguration fails loudly instead of silently defeating the signature.

If you genuinely can't sign, because you're decoding bytes from a source you don't control and can't key, the second line of defense is allowed_classes, with the same shape as PHP's native unserialize():

// reject every class: unknown objects decode as __PHP_Incomplete_Class, never instantiated
$value = phpser_unserialize($blob, ['allowed_classes' => false]);

// or allowlist only the classes you actually expect to read back
$value = phpser_unserialize($blob, ['allowed_classes' => [UserDto::class, OrderDto::class]]);

Enter fullscreen mode Exit fullscreen mode

The same option works on phpser_unserialize_signed() too, so you can combine a valid signature with a class allowlist for defense in depth. Underneath both paths, the decoder is hardened on its own: a recursion depth cap of 512 bounds stack use against deliberately deep payloads (decode returns null, encode throws), and crafted payloads naming a missing enum case or a non-serializable class like Closure are rejected rather than crashing, matching what PHP's own unserialize() refuses.

What I took away

igbinary is still the serializer I'd reach for on a general-purpose workload, and I'll keep using it. It's mature, it's everywhere, and on a mixed rowset it's still a hair ahead on decode.

For a read-heavy cache, phpser gives me two things at once. It's faster on the shapes caches actually hold, encode and decode both, because the wire format is designed around the reader rather than balanced between reader and writer. And signed mode means I can decode from redis without treating every read as a potential injection. Speed and trust are the two things a cache serializer has to get right, and they're the two things a general serializer leaves half-finished. Building both in was the point.

pie install iliaal/phpser

Enter fullscreen mode Exit fullscreen mode

Source, wire-format spec, and the bench harness: github.com/iliaal/phpser