惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
K
Kaspersky official blog
T
Threat Research - Cisco Blogs
PCI Perspectives
PCI Perspectives
www.infosecurity-magazine.com
www.infosecurity-magazine.com
P
Privacy International News Feed
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
U
Unit 42
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
P
Privacy & Cybersecurity Law Blog
O
OpenAI News
量子位
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
C
Cisco Blogs
AWS News Blog
AWS News Blog
Vercel News
Vercel News
Microsoft Security Blog
Microsoft Security Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
美团技术团队
T
Threatpost
S
Schneier on Security
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
C
Cyber Attacks, Cyber Crime and Cyber Security
Last Week in AI
Last Week in AI
C
CERT Recently Published Vulnerability Notes
Blog — PlanetScale
Blog — PlanetScale
C
Cybersecurity and Infrastructure Security Agency CISA
F
Full Disclosure
博客园_首页
N
Netflix TechBlog - Medium
Security Latest
Security Latest
有赞技术团队
有赞技术团队
Google DeepMind News
Google DeepMind News
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Register - Security
The Register - Security
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Recent Announcements
Recent Announcements
博客园 - Franky
P
Palo Alto Networks Blog
Project Zero
Project Zero
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
H
Help Net Security
Hacker News: Ask HN
Hacker News: Ask HN
Cisco Talos Blog
Cisco Talos Blog
H
Heimdal Security Blog
The Hacker News
The Hacker News
博客园 - 【当耐特】
GbyAI
GbyAI

Homepage on Yihui Xie | 谢益辉

Bye, Stack Overflow - Yihui Xie | 谢益辉 Converting testthat Tests to testit - Yihui Xie | 谢益辉 Reflections on AI-assisted Programming - Yihui Xie | 谢益辉 Preliminary Support for Typst in knitr - Yihui Xie | 谢益辉 R.I.P., Tomas Kalibera - Yihui Xie | 谢益辉 An Introduction to xfun - Yihui Xie | 谢益辉 tinyimg: An R Package for Compressing Images - Yihui Xie | 谢益辉 A CDN-backed CTAN Mirror: `tlnet.yihui.org` - Yihui Xie | 谢益辉 Announcing TinyTeX Binaries for arm64 and musl-based Linux - Yihui Xie | 谢益辉 TinyTeX on macOS: No More Messing with `/usr/local/bin` - Yihui Xie | 谢益辉 R.I.P., John Fox - Yihui Xie | 谢益辉 R.I.P., Fritz Leisch - Yihui Xie | 谢益辉 Bye, Hex Stickers - Yihui Xie | 谢益辉 Navigating CRAN's Reverse Dependency Check Logs - Yihui Xie | 谢益辉 Viewing Nested Lists with `xfun::tabset()` - Yihui Xie | 谢益辉
The Surprising Slowness of `textConnection()` in R - Yihui Xie | 谢益辉
Yihui Xie · 2026-03-29 · via Homepage on Yihui Xie | 谢益辉

Earlier this month, @idavydov filed an issue against Quarto reporting that it is about 100x slower than rmarkdown for documents with long output. The minimal reprex was striking:

cat(strrep("x\n", 100000))

Running quarto render on a document with that single chunk took 35 seconds. The equivalent rmarkdown::render() finished in under half a second. As a side note in the issue, the reporter pinged me that the same problem existed in litedown. litedown is independent of both Quarto and knitr; it executes R code through xfun::record(). That is where I started looking.

Profiling

I reproduced the issue in litedown and ran utils::Rprof() on xfun::record(). The first result looked clear: cat() was consuming 88% of runtime, with a call stack that went through one_string → paste → cat. My initial diagnosis was that xfun::record() was collapsing the output lines into a single string with paste() before writing, and the string concatenation was slow.

That turned out to be the wrong diagnosis.

I re-profiled, this time passing gc.profiling = TRUE to also capture garbage collection data. The new profile completely changed the picture: 56% of the total runtime was GC. Not cat(), not paste()—the garbage collector. GC consuming more than half your runtime is not a sign of slow code per se; it is a sign that something is creating an enormous number of short-lived objects that R has to keep reclaiming. The question shifted from “why is cat slow?” to “what is producing all this garbage?” What a relief to kittens! How sad garbage trucks are!

Finding the Cause

With GC as the real culprit, I examined handle_output() inside xfun::record()—the function that captures chunk output via sink()—with Claude’s help. The answer was sitting right at the top:

con = textConnection('out', 'w', local = TRUE)

A writable textConnection appends one element per line to a character vector. So for cat(strrep("x\n", 100000)), R is effectively doing this 100,000 times in a tight loop:

out = c(out, new_line)

Because R vectors are copy-on-modify, each c() call allocates a brand-new vector and copies all previous content into it before adding the new element. The growth pattern looks like:

NULL → [1 elem] → [2 elems] → [3 elems] → ... → [100,000 elems]

That is $O(n^2)$ in both allocations and data copies, and every discarded intermediate vector becomes garbage for R’s collector to clean up. The initial profile was not lying about cat—time really was being spent writing output—but the GC profile told the deeper truth. Without gc.profiling = TRUE, I would have chased the wrong thing.

Why rmarkdown Was Fast

rmarkdown itself uses knitr, which in turn uses the evaluate package to execute code. evaluate captures output by sinking into a file() connection, which delegates buffering to the operating system’s file I/O layer and sidesteps the R-level vector-growing trap entirely. It never accumulates a character vector one line at a time; it just writes bytes.

The Fix

The fix is to replace textConnection() with rawConnection(). A raw connection uses a dynamically-growing byte buffer internally—similar to realloc() with doubling in C—so appending is amortized $O(1)$ rather than $O(n^2)$. The change in xfun::record() was a few lines:

# before
out = NULL
con = textConnection('out', 'w', local = TRUE)
# ...
sink()
close(con)

# after
con = rawConnection(raw(0), 'w')
# ...
sink()
out = rawToChar(rawConnectionValue(con))
close(con)
out = strsplit(out, '\n', fixed = TRUE)[[1]]

Instead of reading out directly from the connection variable (which textConnection writes into by name), we retrieve the buffer at the end with rawConnectionValue(), convert it to a character string with rawToChar(), and split on newlines ourselves.

Another alternative I considered was sinking into a file('') connection opened in read/write mode, which would also avoid the quadratic growth. I went with rawConnection() instead because I wanted a pure in-memory solution with no involvement of the file system at all.

After the fix, the runtime (for cat(strrep("x\n", 50000)) instead of 100000) dropped from 5.58 seconds to 1.30 seconds—a 4.3× speedup—with cat and GC disappearing from the profile entirely.

What I Took Away

I have used textConnection() in R for a very long time and never thought to question it. It is documented, idiomatic, and used inside base R itself (see ?capture.output). For typical usage—capturing a few lines of output—it is perfectly fine. The quadratic behavior only bites you when the output is large, which is rare enough that it stayed hidden for years.

The lesson I keep relearning is that “idiomatic” and “efficient” are not the same thing. When something feels slow in a way that is hard to explain intuitively, profiling almost always surfaces something surprising. In this case, it was a base R function that I had mentally filed under “fast and boring” that turned out to have a hidden $O(n^2)$ trap.

Donate

As a freelancer (currently working as a contractor) and a dad of three kids, I truly appreciate your donation to support my writing and open-source software development! Your contribution helps me cope with financial uncertainty better, so I can spend more time on producing high-quality content and software. You can make a donation through methods below.

  • Venmo: @yihui_xie, or Zelle: [email protected]

  • Paypal

    • If you have a Paypal account, you can follow the link https://paypal.me/YihuiXie or find me on Paypal via my email [email protected]. Please choose the payment type as “Family and Friends” (instead of “Goods and Services”) to avoid extra fees.

    • If you don’t have Paypal, you may donate through this link via your debit or credit card. Paypal will charge a fee on my side.

  • Other ways:

    WeChat Pay (微信支付:谢益辉) Alipay (支付宝:谢益辉)
    WeChat Pay QR code Alipay QR code

When sending money, please be sure to add a note “gift” or “donation” if possible, so it won’t be treated as my taxable income but a genuine gift. Needless to say, donation is completely voluntary and I appreciate any amount you can give.

Please feel free to email me if you prefer a different way to give. Thank you very much!

I’ll give back a significant portion of the donations to the open-source community and charities. For the record, I received about $30,000 in total (before tax) in 2024-25, and gave back about $15,000 (after tax).