惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Pierce Freeman

A browser for agents | Pierce Freeman The grey market of podcast appearances The way I travel | Pierce Freeman Local tools should still use vaults We solved scratch content first Starting a podcast in 2025 Being late but still being early Automating our home video imports Adding my parents to tailscale A deep dive on agent sandboxes Language servers for AI | Pierce Freeman My simple home podcast studio We need centralized infrastructure | Pierce Freeman Coercing agents to follow conventions using AST validation My unified theory of social selling My personal backup strategy | Pierce Freeman July updates to the homelab How the KV Cache works httpx is the right way to do web requests in Python Reputation is becoming everything | Pierce Freeman Building a (kind of) invisible mac app Updated knowledge in language models Making an ascii animation | Pierce Freeman How speculative decoding works | Pierce Freeman Under the hood of Claude Code Doing things because they're easy, not hard Speeding up sideeffects with JIT in mountaineer Firehot for hot reloading in Python Misadventures in Python hot reloading How text diffusion works | Pierce Freeman The tenacity of modern LLMs The ergonomics of rails | Pierce Freeman How language servers work | Pierce Freeman Just add eggs | Pierce Freeman Unfortunately SEO still matters | Pierce Freeman The futility of human-only web requirements Setting up Input Leap | Pierce Freeman Checking in on Waymo | Pierce Freeman The react revolution | Pierce Freeman Speeding up many small transfers to a unifi nas Quick notes on swift libraries AI engineering is a different animal San Francisco | Pierce Freeman Debugging a mountaineer rendering segfault Local network config on macOS Building our home network | Pierce Freeman Introducing Envelope.dev Legacy code and AI copilots Typehinting from day-zero | Pierce Freeman Generating database migrations with acyclic graphs Lofoten | Pierce Freeman Mountaineer v0.1: Webapps in Python and React Constraining LLM Outputs | Pierce Freeman Passthrough above all | Pierce Freeman Accuracy in kudos | Pierce Freeman How quick we are to adapt The curious case of LM repetition Costa Rica | Pierce Freeman Debugging chrome extensions with system-level logging Speeding up runpod | Pierce Freeman Inline footnotes with html templates Parsing Common Crawl in a day for $60 An era of rich CLI All or nothing with remote work The Next 10 Years | Pierce Freeman Adding wheels to flash-attention | Pierce Freeman LLMs as interdisciplinary agents | Pierce Freeman New Zealand | Pierce Freeman Representations in autoregressive models | Pierce Freeman Let's talk about Siri | Pierce Freeman Minimum viable public infrastructure | Pierce Freeman Reasoning vs. Memorization in LLMs Automatically migrate enums in alembic Greater sequence lengths will set us free On learning to ski | Pierce Freeman Dolomites | Pierce Freeman Using grpc with node and typescript Opportunity years | Pierce Freeman Buzzword peaks and valleys | Pierce Freeman Buenos Aires | Pierce Freeman Network routing interaction on MacOS Independent work: November recap Debugging slow pytorch training performance The provenance of copy and paste Debugging tips for neural network training Patagonia | Pierce Freeman Santiago | Pierce Freeman My 2022 digital travel kit AWS vs GCP - GPU Availability V2 Independent work: October recap | Pierce Freeman Planning Patagonia Relationship modeling | Pierce Freeman The power of status updates A new chapter | Pierce Freeman Give my library a coffee shop AWS vs GCP - GPU Availability V1 Switzerland | Pierce Freeman Headfull browsers beat headless | Pierce Freeman Webcrawling tradeoffs | Pierce Freeman Copenhagen | Pierce Freeman
Fixing slow AWS uploads | Pierce Freeman
2026-02-18 · via Pierce Freeman

I usually store private datasets on my NAS. This lets me do a significant amount of prototyping locally; you can do a lot with a 10gbps network and a modern laptop1. But sometimes you really do need 48 or 96 CPUs chugging on a problem at the same time.

For non-GPU work I usually spin up a beefy box on AWS or GCP. But as I was rsyncing a particularly big set of files I walked away for a coffee and came back to upload speeds around 2MB/s. Sometimes it would drop to 500kbps and sometimes jump to 10MB but never push much higher. Let's see if you can spot the issue right off the bat:

rsync -av --progress \ --exclude='.*' \ -e "ssh -i ~/.ssh/primary-laptop.pem" \ /Volumes/Common_Drive/dataset \ [email protected]:~/dataset/

If you can - then congratulations! No need for this blog post. But if you didn't these results just don't make any sense:

  • Client has a 10 Gbps symmetric fiber connection (Sonic in SF)
  • Powerful EC2 instance: c5d.12xlarge so network nor CPU used by rsync should slow it down
  • This box has an SSD

I provisioned this box with a large NVMe storage for fast access to local compute. The d in c5d.12xlarge actually means "instance store volumes included." If I'm going to be paying for 48 CPUs I want to be fully saturating them.2

But I made the mistake of copying my rsync from a previous run to my local homelab. Where the home directory is a perfectly reasonable place to dump a folder until you figure out its permanent location. It's all backed by the same SSD. But on AWS that home directory has dragons. By default it's booted to a slow EBS network-attached storage device.

Specifically:

/dev/root → Amazon Elastic Block Store (EBS)

So my data path looked like:

Laptop → Network (Internet) → EC2 → Network (internal) -> EBS

With TCP, the receiver controls the pace. So even though it looked like an overall network issue, that was just the backpressure from the EBS "disk" leading to slower writes from the EC2 image which then in turn looked like slower network speeds on my end.

EBS volumes - for what it's worth - are pretty basic disks with low default throughput caps and IOPS limits. You can customize them to be higher but you're almost always better off using these nvme disks if you are doing data processing.

The Fix

If you suspect this might be happening to you, check lsblk:

nvme1n1  Amazon EC2 NVMe Instance Storage
nvme2n1  Amazon EC2 NVMe Instance Storage

Format and mount one:

sudo mkfs.ext4 -F /dev/nvme1n1
sudo mount /dev/nvme1n1 /mnt/nvme

Then change your rsync target to:

/mnt/nvme/

And in my case I immediately saw speeds jump from ~2 MB/s to ~26 MB/s.

Conclusion

Beware the home directory! And make use of your local disks. Your pipelines will thank you.

  1. Especially for data processing when using an OLAP database. ↩

  2. Disk speeds be damned! ↩