





















A three-line deploy script pushed a new image to ECR. The CI passed. The tag was latest. Twelve minutes later, every EC2 instance in the auto-scaling group had the new container running. And every single one was crashing with exec format error on startup. The developer who built the image was on an M2 Mac. The CI runner was AMD64. The Dockerfile had no --platform flag, so the build created an arm64 image that would not run on any x86 CPU. The sprint was not off to a good start.
This is the most common multi-architecture failure mode, and it is completely preventable. If you work on a team where some developers use Apple Silicon Macs, others use x86 Linux workstations, and production runs on AMD64 (or a mix of Graviton and x86 instances), you need a build pipeline that produces images for both architectures from a single Dockerfile. This post covers the full setup: buildx drivers, QEMU emulation, platform-specific native addons, CI integration with GitHub Actions, and the caching strategy that keeps multi-platform builds fast enough to run on every push.
Docker images are not architecture-agnostic. The base image you specify in FROM node:22-slim is a manifest list that points to different images for different CPU architectures. When you build without specifying a platform, Docker uses the architecture of the build host. On an M-series Mac, that is linux/arm64. On a standard GitHub Actions runner, that is linux/amd64. The resulting image layers contain binaries compiled for the host architecture. Attempting to run an arm64 binary on an amd64 CPU without emulation produces the exec format error you saw above.
The problem is worse for Node.js applications that depend on native addons. Packages like sharp, node-canvas, pg-native, and anything using node-gyp compile C++ code during npm install (or npm rebuild). Those compiled artifacts are architecture-specific. An arm64 sharp binary will segfault on x86 even if QEMU emulation is present at the container level, because the emulation overhead on hot GPU-accelerated code paths produces silent corruption or crashes that are much harder to debug than a clean exec format error.
The fix is to build for the target architecture explicitly, in a controlled build environment, and push a manifest list that lets each host pull exactly the image it needs.
Docker Desktop on Mac includes buildx by default. On Linux, you may need to install it separately. The docker buildx command wraps the classic docker build with multi-platform support, and it ships with every Docker version 19.03 and newer.
For multi-platform builds, Docker uses QEMU binaries to emulate foreign architectures during the build phase. This is needed because even the build steps (RUN commands that compile native addons) must run on the target architecture. To register the QEMU handlers with the Linux kernel, run this on every build host (including CI runners):
# Install QEMU static binaries and register them with binfmt_misc
docker run --privileged --rm tonistiigi/binfmt --install all
This container mounts the host’s /proc/sys/fs/binfmt_misc and installs handlers for arm64, armv7, s390x, ppc64le, and riscv64. Once registered, the kernel can transparently execute foreign-architecture binaries through QEMU.
You can verify the registration:
$ docker run --rm --platform linux/arm64 node:22-slim uname -m
aarch64
If that command works on an AMD64 host, QEMU is configured correctly. If it hangs or errors, the binfmt handlers are not installed.
Important: GitHub Actions hosted runners do not have binfmt handlers pre-installed. You must register QEMU on every CI run. The docker/setup-qemu-action does this in one step, which we will cover in the CI section below.
Buildx uses builder instances to manage the build context and cache. The default builder does not support multi-platform builds. You need to create one with a docker-container driver:
docker buildx create --name multiplatform --driver docker-container --bootstrap
docker buildx use multiplatform
The docker-container driver starts a BuildKit container that handles the multi-platform build orchestration. It persists its cache in a named volume, so subsequent builds are incremental. The --bootstrap flag starts the builder immediately so your first build does not wait for a cold start.
On CI, you will not run these commands directly. The docker/setup-buildx-action handles them for you.
A well-written Dockerfile should produce identical images for both architectures with the same set of layers. Most Dockerfiles need zero changes to become platform-agnostic if they follow standard practices. The tricky parts are native addon installation and platform-specific package manager repos.
Here is a Dockerfile for a Node.js service that handles both architectures correctly:
# docker/Dockerfile
FROM node:22-slim AS base
WORKDIR /app
# Stage 1: install production dependencies
FROM base AS deps
COPY package.json package-lock.json ./
# --platform is inherited from the build invocation
RUN npm ci --omit=dev --ignore-scripts
# Build native addons for the target platform
RUN npm rebuild
# Stage 2: build step (only if you have a build step)
FROM base AS build
COPY package.json package-lock.json ./
RUN npm ci
COPY tsconfig.json ./
COPY src/ src/
RUN npm run build
# Stage 3: production image
FROM base AS production
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
tini \
&& rm -rf /var/lib/apt/lists/*
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["node", "dist/index.js"]
The key line is RUN npm rebuild in the deps stage. This compiles native addons for whatever architecture the build is targeting (arm64 or amd64). If you skip this step and rely on precompiled binaries, sharp and similar packages will download the correct prebuilt binary based on the target platform, but only if they detect it correctly inside QEMU. Explicit rebuild is safer.
Some packages ship prebuilt binaries for popular platforms. sharp, for example, distributes prebuilt libvips binaries for linux-x64 and linux-arm64. These download automatically during npm install and do not require a C++ toolchain. This works well for multi-platform builds because npm downloads the correct binary for the emulated architecture.
Other packages always compile from source. The node-rdkafka package, for instance, requires librdkafka and a C compiler. When this happens inside QEMU, the build is slower (sometimes 5-10x slower) but it produces correct output. The solution is not to skip the package. It is to accept the slower build and use Docker caching to avoid rebuilding on every push.
If a native addon genuinely fails to build under QEMU (some packages check /proc/cpuinfo or use architecture-specific inline assembly), you have three options:
Option 1 is the most practical for most teams. We cover it in the CI section.
With buildx configured and a platform-agnostic Dockerfile, the build command is straightforward:
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag myregistry/myapp:latest \
--tag myregistry/myapp:$(git rev-parse --short HEAD) \
--cache-to type=registry,ref=myregistry/myapp:cache,mode=max \
--cache-from type=registry,ref=myregistry/myapp:cache \
--push \
-f docker/Dockerfile \
.
Breaking this down:
--platform linux/amd64,linux/arm64 tells buildx to build for both architectures. BuildKit creates separate layer sets for each platform, then assembles a manifest list (also called a fat manifest) that points to both images.--cache-to type=registry,... exports the build cache to your container registry. The mode=max flag caches all layers, including intermediate stages. Without this, subsequent CI builds rebuild everything from scratch.--cache-from type=registry,... pulls the cache from the registry before building, so unchanged layers are reused.--push pushes the manifest list and both platform-specific images to the registry in one command.The first build will take 3-5 minutes because QEMU emulation slows down native addon compilation. Subsequent builds take 30-60 seconds because the cache hits for everything except changed layers.
Here is the complete GitHub Actions workflow that builds and pushes multi-platform images on every push to main:
# .github/workflows/docker-build.yml
name: Docker Multi-Platform Build
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
- name: Build and push
uses: docker/buildx-action@v3
with:
context: .
file: docker/Dockerfile
platforms: linux/amd64,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
The critical steps are docker/setup-qemu-action (registers binfmt handlers on the runner) and docker/setup-buildx-action (creates a docker-container builder). Without these two steps, the build either fails or produces single-architecture images.
Note the use of type=gha for cache in this example. GitHub Actions cache (type=gha) is scoped to the repository and does not require additional registry storage. For larger images, registry-based cache (type=registry,ref=...) is faster because it is not subject to the 10GB GitHub Actions cache limit. Choose whichever fits your infra.
GitHub offers ubuntu-24.04-arm hosted runners (public beta). If your team uses ARM-based infrastructure (Graviton EC2 instances, for example), you can build the arm64 image natively and skip QEMU entirely:
jobs:
build-amd64:
runs-on: ubuntu-latest
steps:
# ... checkout, buildx setup, login ...
- name: Build amd64
uses: docker/buildx-action@v3
with:
platforms: linux/amd64
outputs: type=image,name=myapp,push-by-digest=true,name-canonical=true
build-arm64:
runs-on: ubuntu-24.04-arm
steps:
# ... same setup steps ...
- name: Build arm64
uses: docker/buildx-action@v3
with:
platforms: linux/arm64
outputs: type=image,name=myapp,push-by-digest=true,name-canonical=true
merge:
needs: [build-amd64, build-arm64]
runs-on: ubuntu-latest
steps:
- name: Create manifest list
run: |
docker manifest create myapp:latest \
myapp@$(cat /tmp/amd64-digest) \
myapp@$(cat /tmp/arm64-digest)
docker manifest push myapp:latest
This approach is faster (no QEMU slowdown) but requires more CI configuration and two parallel job runs. It is worth it if your native addon compilation takes more than 30 seconds inside QEMU. For most Node.js projects with typical packages like sharp, ioredis, and kafkajs, the single-job buildx approach is fast enough and much simpler.
Some Node.js packages ship different versions for different platforms, or need platform-specific configuration at build time. Three patterns handle this cleanly:
Use scripts in package.json that detect the platform:
{
"scripts": {
"postinstall": "node scripts/platform-check.js"
}
}
// scripts/platform-check.js
const os = require('os');
const arch = os.arch(); // 'arm64' or 'x64'
const platform = os.platform(); // 'linux' or 'darwin'
if (platform === 'linux' && arch === 'arm64') {
// Apply arm64-specific patches or configurations
console.log('Applying arm64-specific configuration');
}
Modern package managers handle platform-specific packages through optional dependencies. The lockfile records which packages were resolved for which platform. When you run npm ci inside an emulated arm64 build, npm downloads the arm64 variants automatically. This is why multi-platform builds work without changes for most projects. The packages that cause trouble are the ones that do not publish platform-specific variants and always compile from source.
If one architecture needs different base packages, use build arguments:
ARG TARGETARCH
FROM node:22-slim AS base
WORKDIR /app
FROM base AS deps-arm64
RUN apt-get update && apt-get install -y --no-install-recommends \
libvips-dev-arm64-cross
FROM base AS deps-amd64
RUN apt-get update && apt-get install -y --no-install-recommends \
libvips-dev
FROM deps-${TARGETARCH} AS deps
# Continue with npm ci and npm rebuild
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
The TARGETARCH variable is automatically set by buildx to arm64 or amd64 based on the platform being built. Use it to select architecture-specific dependencies without manual conditionals.
Building for two architectures is step one. Verifying both images actually work is step two. Add a smoke test stage to your workflow:
- name: Smoke test arm64
run: |
docker run --rm --platform linux/arm64 \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.meta.outputs.version }} \
node -e "require('./dist/index.js'); console.log('arm64 OK')"
- name: Smoke test amd64
run: |
docker run --rm --platform linux/amd64 \
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.meta.outputs.version }} \
node -e "require('./dist/index.js'); console.log('amd64 OK')"
For production confidence, run a full integration test suite against both images:
- name: Integration test arm64
run: |
docker compose -f docker-compose.test.yml -p test-arm64 up \
--abort-on-container-exit --exit-code-from app
# Override the app service image platform
docker compose -f docker-compose.test.yml -p test-arm64 run \
-e DOCKER_DEFAULT_PLATFORM=linux/arm64 app npm test
The most common complaint about multi-platform builds is that they are too slow. The complaint is almost always caused by missing or misconfigured cache. Here are the three cache rules:
Rule 1: Use mode=max for --cache-to. The default mode (mode=min) only caches the final stage layers. Intermediate stages (deps, build) are rebuilt every time. With mode=max, all stages are cached across builds.
Rule 2: Do not mix type=registry cache and type=gha cache in the same workflow unless you understand the tradeoffs. Registry cache is persistent and shared across branches. GHA cache is scoped to the branch and has a 10GB limit. For a monorepo with multiple services, registry cache is the safer choice.
Rule 3: Cache keys include the platform. BuildKit automatically scopes cache entries by platform, so an amd64 build does not pollute the arm64 cache. You do not need to manage this yourself, but you should know it works so you do not add manual platform suffixes that break the built-in scoping.
Multi-platform Docker builds for Node.js require exactly four things:
tonistiigi/binfmt or docker/setup-qemu-action.docker-container driver.RUN npm rebuild for native addons and ARG TARGETARCH for platform-specific dependencies.mode=max to keep build times reasonable.The workflow above took about 15 minutes to write and copy into your repo. The first multi-platform build will take 3-5 minutes. Every subsequent build will take under a minute for a typical Node.js service. The cost of not having this setup is the deploy that silently pushes arm64 images to an amd64 fleet, which costs far more than 15 minutes to debug.
Before your next deploy, run through this checklist:
docker/setup-qemu-action is in your workflow (or binfmt is registered on self-hosted runners).docker/setup-buildx-action creates a builder with docker-container driver.build action specifies platforms: linux/amd64,linux/arm64.npm rebuild in a Docker stage (not just npm ci).mode=max using either type=gha or type=registry.node -e "require('./app')") before the manifest is pushed.Do not let the next architecture mismatch incident be the one that wakes you up at 2 AM. A dozen lines of YAML, a handful of Dockerfile conventions, and you ship to every architecture your infra runs.
Infrastructure patterns like multi-platform container builds are exactly the kind of production plumbing that separates a service that deploys cleanly from one that crashes on the first Graviton instance. Getting the QEMU setup, cache strategy, and native addon handling right requires the same attention to detail that Yojji applies to every client engagement, from CI/CD pipelines to full-stack product delivery. Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their senior engineering teams specialize in the JavaScript ecosystem, cloud infrastructure on AWS, Azure, and Google Cloud, and the full cycle of product delivery from discovery through DevOps.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。