惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

Hacker News: Front Page

Let's talk about EU Sovereignty (2025) OpenRouter raises $113M Series B Ask HN: What Is the State of App Development in 2026? A Probabilistic Algorithm for Repairing All Roads in Lebanon via Papal Visits Voxel Space Memory decline after menopause linked to loss of estrogen production in brain tissue Anthropic surpasses OpenAI to become world’s most valuable AI startup AMD Customer Community Helios. Is plug-in solar worth it? Openrsync: An implementation of rsync, by the OpenBSD team pandoc-templates.org 'Mind-blowing': Iron-rich immune cells help homing pigeons navigate Danish pension fund excludes SpaceX citing governance and valuation Company accidentally blows $500M on Claude AI in one month OpenRCT2 v0.5.1 “Swamp Castle" released! Perry — TypeScript → Native What Is a Dickover? The Office of Management and Budget tries again to cripple US science MCP is dead | Quandri Engineering FreeCal — calendars for your organisation Free full BGP feed. IPv4 and IPv6 Trillion Characters The Last Technical Interview The California State Assembly Has Passed the 'Protect Our Games Act' GitHub - jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM I Tested Whether AI Can Fix Security Vulnerabilities. Well, It's Complicated. On Rendering Diffs EV Stupidity Checklist Current Rothko AI will be used to estimate age of asylum seekers from next year SQLite is All You Need for Durable Workflows - Blog Records show UC sharing data with US Customs and Border Protection Rsync maintainer starts uses Claude, regressions mount TV Explorer — 10,000 Free TV Channels Notes from the Mistral AI Now Summit in Paris GTA 6 Developers Unionize bijou64 It Will Never Be the Year of the Linux Desktop · unix.foo I Am Retiring from Tech to Live Offline Blue Origin rocket explodes on launchpad in a setback Headway Therapy Patients Forced to Scan Their Faces to Keep Getting Care It's hard to justify buying a Framework 12 Please Use AI Expertise in the Age of AI Poisonous invasion: What is the 'devil's trumpet' harming crops in Iraq? Step 3.7 Flash — A high-efficiency Flash model for Real-World Canada slipped into a technical recession on an annualized basis as economic growth stalled in 1st quarter local git remotes — alexander cobleigh Poll: How often do you check "newest"? We should be more tired than the model High Density Living, 2000 Years Ago: Inside the Roman Apartment Building Danish Pension Fund Blacklists SpaceX, Citing Governance Issues Free Furigana Converter: Kanji to Hiragana | EZFurigana The UK Government's Low Value Purchase System is a Waste of Time We should be more tired than the model Forward Deployed Engineer: AI + HPC at Cedana | Y Combinator Hundreds of prolific Wikipedia editors are threatening to go on strike This AI startup will clean your home for free to train future robots Tulip mania: when a single flower was worth more than a house Is AI causing a repeat of Frontend’s Lost Decade? Digital Identity Management in Norway is a Success but also a Disaster - Research News Jamie Hurst's Blog - Is this sustainable? Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request) wterm | Terminal Emulator for the Web Corporate America Is Starting to Ration AI as Cost Skyrockets US Military personnel are being targeted using location data [BUG] Login no more possible, Android App still works GitHub - RasmusGodske/claude-hook-utils GitHub - HeidiSQL/HeidiSQL: A lightweight client for managing MariaDB, MySQL, SQL Server, PostgreSQL, SQLite, Interbase and Firebird, written in Delphi and Lazarus/FreePascal Let's compile Quake like it's 1997! Cars are trying to spy on you, and it's only just the beginning Strengthening societal resilience with Rosalind Biodefense Italians and Dutch share the same gestural instinct for teaching The most spectacular rocket explosion since N1 just happened in Florida I Read the Claude Code Source Code. Here's Everything You Can Configure That the Docs Don't Tell You. The Secret Garden of Rock-Paper-Scissors Blue Origin's New Glenn blows up during static fire test Where Are the Economies of Scale in Homebuilding? ‘We cannot ban our way out of a youth mental health crisis’: social media bans for teenagers lack evidence and pose risks, scientists say - Science news The DLES.gg Manifesto Vibe Coding Is Not Engineering AI Job Grief: The Unnamed Psychological Crisis Hitting Tech Workers Ember 7.0 Released Someone used my open source project to phish 14,000 people Bot Company allegedly trashing Airbnb rentals with their prototype robots Minimax M3 - Harvey Minimax Water Softeners I'm "Retiring" from Tech – Chad Whitacre, Head of Open Source, Sentry.io Ask HN: Entrepreneurs, how long did it take you to succeed? Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code I Made a Million Dollar Product from My Dorm Room - Nick Winans Disgruntled 0-day hunter 'humiliated' by Microsoft pledges 'bone shattering drop' as Redmond calls cops White House proposes new rules giving political appointees final approval on research grants Bricks and Minifigs Stole a Man's $200k Lego Collection Separate The Cord From The Device Client Challenge Postgres-backed Durable Workflow Execution | DBOS About LLMs at Zig Days Protestware for coding agents The lone lisp heap Anthropic raises $65B in Series H funding at $965B post-money valuation
Microcode inside the Intel 8087 floating-point chip: register exchange
pwg · 2026-05-31 · via Hacker News: Front Page

In 1980, Intel introduced the 8087 floating-point chip, a co-processor that made floating-point operations up to 100 times faster. This chip was highly influential, and today most processors use the floating-point standard introduced by the 8087.

The 8087 uses complicated algorithms to accurately compute functions such as square roots, tangents, and exponentials. These algorithms are implemented inside the chip in low-level code called microcode. I'm part of a group, the Opcode Collective, that is reverse-engineering this microcode. In this post, I take a close look at the microcode for one of the 8087's instructions—FXCH—and explain how the microcode works. The FXCH (Floating-point Exchange) instruction exchanges two floating-point registers. You might expect this instruction to be trivial, but there's more going on than you might expect; the microcode uses 14 micro-instructions to implement the exchange instruction.

The Intel 8087 chip is packaged in a 40-pin DIP (dual in-line package).

The Intel 8087 chip is packaged in a 40-pin DIP (dual in-line package).

To explore the microcode, I opened up an 8087 chip and created a high-resolution image with a microscope. The large microcode ROM occupies a central position, holding the micro-instructions that control the chip. The microcode engine on the left steps through the microcode, handling jumps and subroutine calls. The bottom half of the chip is the "datapath", the circuitry that performs floating-point calculations; it is split into a 16-bit datapath for the number's exponent and a 64-bit datapath for the number's fractional part (also known as the significand).

Die of the Intel 8087 floating-point unit chip, with main functional blocks labeled. The die is 5mm×6mm.  Click for a larger image.

Die of the Intel 8087 floating-point unit chip, with main functional blocks labeled. The die is 5mm×6mm. Click for a larger image.

This post focuses on the temporary registers and stack registers that are highlighted in red. The chip has two temporary registers and eight stack registers, each holding a number's exponent and fraction. Each register also has two tag bits that label the type of value in the register. The stack control circuitry at the right manages the stack, keeping track of the top-of-stack position as values are pushed onto the stack or popped off the stack.

Executing an 8087 instruction such as arctan requires hundreds of internal steps to compute the result. These steps are implemented in microcode with micro-instructions specifying each step of the algorithm. (Keep in mind the two levels of instructions: the assembly language instructions used by a programmer and the undocumented low-level micro-instructions inside the chip.) The microcode ROM holds 1648 micro-instructions, implementing the 8087's instruction set. Each micro-instruction is 16 bits long and performs a simple operation such as moving data inside the chip, adding two values, or shifting data. I'm working with the Opcode Collective to reverse-engineer the micro-instructions and fully understand the microcode (link).

The 8087's micro-instructions are complicated, with many corner cases and ad hoc functions, but I'll provide a simplified overview. Each micro-instruction consists of 16 bits, as shown below. The first three bits specify the type of the micro-instruction, which controls the meaning of the remaining bits. The first type indicates a transfer operation, transferring data from one internal register to another. The two fields specify the source and destination for the data. The three unspecified bits are used for various special cases. Next is a shift operation, which uses the barrel shifter to shift a value left or right. The third type of micro-instruction uses the adder/subtractor. It can also be used in a loop for multiplication or division. Fourth are various arithmetic control micro-instructions that configure the adder, set rounding modes, and so forth. The far jump and far call micro-instructions perform a jump or subroutine call to a target micro-address in a fixed list. The condition field allows conditional jumps/calls based on numerous conditions, while the last bit inverts the condition. A local jump allows a conditional jump to a nearby micro-instruction. Finally, the miscellaneous micro-instructions range from returning from a subroutine or raising an exception to ending the microcode execution.

Structure of an 8087 micro-instruction.

Structure of an 8087 micro-instruction.

How values are stored inside the 8087 chip

The 8087 supports a variety of data types: floating-point numbers of various sizes, integers, and binary-coded decimal. But internally, everything is stored as an 80-bit floating-point number. A number has three parts: a 64-bit significand (the fractional part), a 15-bit exponent, and a sign bit. The chip has two separate data paths: one for the significand, and one for the exponent and sign.

The chip has eight registers to store numbers during calculations, the top registers in the diagram below. However, the registers are organized in an unusual way: as a stack, with numbers pushed to the stack and popped from the stack. Instead of accessing, say, register #3, you might access the third register from the top of the stack, denoted ST(3); as values are pushed or popped, ST(3) changes. The stack-based architecture was intended to improve the instruction set, simplify compiler design, and make function calls more efficient, although it didn't work as well as hoped.

The register set of the 8087, as seen by the programmer. From 8086 Family Numerics Supplement.

Many 8087 instructions act on the top of the stack. For instance, the square root instruction replaces the value on the top of the stack with its square root. But what if you want to take the square root of a value in the middle of the stack? The solution is the FXCH instruction, the focus of this article. This instruction exchanges the value on the top of the stack with a specified stack position, providing access to values inside the stack.

One more feature of the 8087 is important to this discussion: each value in the register stack has an associated "tag" value, labeling it as valid, special, zero, or empty. A "normal" floating-point value is tagged as valid. If the floating-point value is infinity, Not a Number, or a denormalized value, then it is tagged as special. A zero value is tagged as zero. Finally, if a register is empty (e.g., its value has been popped off the stack), the register is tagged as empty. The 8087 uses tags to optimize performance and detect errors.1 For instance, if a programmer pops too many values from the stack and tries to read a stack register that is tagged empty, the 8087 raises an "invalid operation" exception.

The eight stack registers are visible to the programmer, but the 8087 also has temporary registers that it uses internally. Two of these temporary registers are important for this article: tmpA and tmpB. Like the stack registers, each temporary register is an 80-bit register, along with two tag bits.

The FXCH microcode

In this section, I'll explain how the microcode for the FXCH exchange instruction works. This instruction exchanges the top-of-stack register with the register at a specified position in the stack. If either register is empty, the instruction will raise an "invalid operation" exception and replace the missing value(s) with the special value "Not a Number" (NaN).

The microcode for the instruction is below, consisting of 14 micro-instructions.2 The first micro-instruction is a transfer, where the source is the top of stack value ST(0) and the destination is the temporary A register. The source specification causes the 64 significand to be placed on the fraction bus, the 16-bit exponent and sign to be placed on the exponent bus, and the two tag bits to be sent to the tag circuitry. The destination tmpA causes the bus values to be stored into the temporary register. Thus, the bits in the micro-instruction cause the desired transfer to take place. The third micro-instruction is similar, but uses a register inside the stack, ST(i), with the index specified in the machine instruction.

FXCH entry point:
#0200 ST(0) -> tmpA           read top of stack
#0201 nop                     Wait a cycle
#0202 ST(i) -> tmpB           Read specified stack register
#0203 if !(tmpA or tmpB empty) jmp #0210 Jump if both registers exist
#0204 set invalid exception   Raise an invalid exception
#0205 if (unmasked) jmp #0213 If interrupt, end
#0206 if !(tmpA empty) jmp #0208 Check if tmpA is empty
#0207 NaN -> tmpA             If so, move NaN to tmpA
#0208 if !(tmpB empty) jmp #0210 Check if tmpB is empty
#0209 NaN -> tmpB             If so, move NaN to tmpB
The happy path and error path continue here:
#0210 tmpB -> ST(0)           Save tmpB to the top of stack
#0211 nop                     Wait a cycle
#0212 tmpA -> ST(i)           Save tmpA to the specified stack register
#0213 RNI                     End of routine: Run Next Instruction
#0214 nop                     Unused
#0215 nop                     Unused
#0216 nop                     Unused

Next, the relative jump at micro-address #0203 illustrates a different type of micro-instruction: the conditional jump. The micro-instruction specifies a condition, in this case, testing if either temporary register is empty. (That is, the hardware tests the tag bits associated with the temporary registers to see if either is the "empty" tag.) The micro-instruction has a bit set to invert the condition. Finally, the micro-instruction has an offset of +6, yielding the jump target #0210. The advantage of a relative offset over specifying a full micro-address is that the offset only requires six bits. (For more information on how conditions are evaluated, see my article Conditions in the Intel 8087 floating-point chip's microcode.)

If either register is empty, the next micro-instruction raises an "invalid" exception. As I'll explain in the next section, you can program the 8087 to either generate an interrupt on an exception or continue processing. The next instruction is a conditional jump that tests if the exception was "unmasked", indicating that an interrupt was generated. In this case, the microcode ends while the main 8086 processor handles the interrupt.

Assuming the interrupt was masked, the microcode now replaces empty values with the special Not a Number value, first checking tmpA and then tmpB. The source NaN causes circuitry to pull the exponent bus to all 1's and the fraction bus to all 0's, except for the top two bits. This particular bit pattern represents Not a Number.3

At micro-address #0210, the empty-register path and the normal path join up to store the temporary registers in the stack registers. This is where the actual exchange operation happens, since tmpA and tmpB are written to the opposite stack positions from where they were read. Finally, RNI (Run Next Instruction) indicates the end of the microcode routine. This stops the microcode engine and gets the 8087 ready for the next instruction.

The nop (no-operation) microcode instructions are interesting. Each pair of stack reads or writes has a nop in the middle, probably due to timing constraints on the registers. The end of the microcode routine has three nop instructions before the next microcode routine starts. These instructions appear to be wasted space in the microcode; maybe the FXCH microcode was shortened by three words during development, causing this gap.

Exceptions

The 8087 has a complicated exception system to handle a variety of problems. Exceptions fall into six categories: invalid operation, denormalized operation, zero divide, overflow, underflow, or precision. An invalid operation occurs, for instance, if you take the square root of a negative number or try to perform an operation on an empty register. An overflow exception occurs if a value is too large to be represented, while an underflow exception occurs if a value is too small. A zero divide exception happens if you divide by zero.4 A precision exception occurs if a number cannot be exactly represented as a floating-point number (which is extremely common). Finally, a denormalized exception occurs if a value is too close to zero to be represented with full accuracy.

What happens if an exception occurs? The 8087 allows the programmer to select exception behavior for each exception type. The first option is for an exception to trigger a CPU interrupt, so the software can handle the problem. For instance, the software could attempt to work around the problem, log an error, or simply terminate the program. Alternatively, the programmer can "mask" an exception. In this case, the 8087 continues the operation in a "reasonable" way. For instance, an overflowed value would be set to infinity, while an invalid value would be set to the special value: "Not a Number" (NaN). For a precision exception (e.g., 1/3), the value is rounded. The designers of the 8087 put a lot of effort into continuing after a masked exception in the best way; the manual has pages of details on all the special cases.5

Handling of exception conditions is split between microcode and hardware. For example, if the FXCH microcode detects an empty register, it executes a set invalid exception micro-instruction. This micro-instruction sets a latch indicating the invalid exception. The 8087's control register includes six mask bits, one for each type of exception, blocking interrupts for that exception type. The hardware combines the exception flip-flop signals with the mask bits in the control register and the exception flags in the status register to see if a new, unmasked interrupt has been triggered. If so, the 8087 circuitry sends an interrupt to the 8086 processor.

On the other hand, if the interrupt is masked, execution of the microcode continues. In the case of FXCH, the microcode replaces empty registers with the Not a Number value. Finally, the microcode routine ends with RNI (Run Next Instruction). This triggers many hardware activities, but the relevant one is copying the state of the exception flip-flops into the status register. This sets the exception bit if the programmer wants to check it. The exception flip-flops are cleared when the next 8087 instruction starts. Since the hardware manages the flip-flops, status register, control register, and interrupt line, the microcode can be simpler and smaller.

Extracting the microcode

The 8087's microcode ROM contains 26,368 bits, specifying 1648 16-bit micro-instructions. At the time, this was a very large ROM; in order to fit it on the die, Intel used a special type of ROM that held two bits per transistor, twice the capacity of a standard ROM. This ROM is semi-analog, using four sizes of transistors to produce four voltage levels. Comparators convert the voltage level to a pair of bits.

A close-up of the 8087's microcode ROM, showing 77 transistors. A transistor is formed where a vertical polysilicon line crosses a horizontal stripe of doped silicon.

A close-up of the 8087's microcode ROM, showing 77 transistors. A transistor is formed where a vertical polysilicon line crosses a horizontal stripe of doped silicon.

To extract the microcode, I took high-resolution images of the ROM after dissolving the metal layer. Gloriouscow used a neural network to categorize the size of each transistor. (You can explore the full image and transistors here.) The next step was determining how to map the transistors to bits. You might expect that the grid of transistors corresponds to the grid of microcode bits. But due to various hardware optimizations, rows and columns are shuffled and mirrored, which I sorted out by studying the circuitry. The result was the microcode, expressed as a table of 0's and 1's.

The next step was assigning meaning to the microcode. For the 8086 processor, the patent provided a lot of detail on the structure of the microcode and the hardware, but the 8087 patent didn't explain the microcode. Instead, we figured out the micro-instructions through a combination of examining the circuitry, looking for patterns in the microcode, and thinking about how instructions might be implemented.

Microcode is usually complicated, and the 8087 is worse than most. The 8087 was on the edge of what was possible at the time, so the designers resorted to special cases and hacks where necessary. For instance, some conditional jumps have side effects such as updating registers. Other instructions set flip-flops that change the behavior of later operations. We're still working to completely understand the micro-instructions at the hardware level.

I plan to continue reverse-engineering the 8087 microcode; for updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. I've been working on this with the members of the "Opcode Collective", especially Smartest Blob and Gloriouscow, who converted the ROM images to microcode data and extensively analyzed the contents. See the 8087 repository on GitHub for more.

AI statement: Despite the presence of the em dash, no AI was used in the writing of this article (details).

Notes and references

  1. Tags are normally invisible to the programmer, but can be accessed through special operations. A programmer can access the 8087 tags by dumping the 8087's state to memory; the tags are stored in a 16-bit "tag word". 

  2. The raw 8087 microcode is available here, decoded by Smartest Blob. I've modified the microcode format for clarity. 

  3. The 8087 indicates a bad value with a special "Not a Number" (NaN) value. The system allows many different representations of NaN: any value with an exponent of all 1's, a nonzero significand (a zero significand indicates negative infinity), and either sign. For an invalid operation, the 8087 uses one particular NaN value, called real indefinite. For an internal 80-bit real number, this value has the top two bits of the significand set internally, and the rest zero, while the exponent bits and sign are all set. (See pages 87 (S-73) and 90 (S-76).) A 32-bit or 64-bit real uses a slightly different bit pattern for NaN; these number formats have an implied "1" bit for the significand, so only one bit of the significand is explicitly set for the real indefinite NaN. 

  4. Dividing by zero usually causes a zero divide exception, but 0 ÷ 0 causes an invalid operation exception, while infinity ÷ 0 is valid, yielding infinity. Just one reason why the microcode is so complicated. 

  5. For more information on the 8087's exceptions, see 8086 Family Numerics Supplement. The exception system is described on page 32 (S-18). The exception flags and exception masks are described on page 24 (S-10). The details of exception handling are described on page 89 (S-75).