惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

NISL@THU
NISL@THU
Vercel News
Vercel News
aimingoo的专栏
aimingoo的专栏
P
Proofpoint News Feed
Stack Overflow Blog
Stack Overflow Blog
T
Tailwind CSS Blog
云风的 BLOG
云风的 BLOG
L
LangChain Blog
有赞技术团队
有赞技术团队
Last Week in AI
Last Week in AI
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
WordPress大学
WordPress大学
博客园 - 司徒正美
宝玉的分享
宝玉的分享
F
Full Disclosure
Microsoft Security Blog
Microsoft Security Blog
The GitHub Blog
The GitHub Blog
V
Visual Studio Blog
B
Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Y
Y Combinator Blog
I
InfoQ
酷 壳 – CoolShell
酷 壳 – CoolShell
Engineering at Meta
Engineering at Meta
博客园 - 聂微东
博客园 - Franky
MyScale Blog
MyScale Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
T
The Blog of Author Tim Ferriss
月光博客
月光博客
H
Help Net Security
B
Blog RSS Feed
人人都是产品经理
人人都是产品经理
V
V2EX
罗磊的独立博客
小众软件
小众软件
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
大猫的无限游戏
大猫的无限游戏
N
Netflix TechBlog - Medium
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Hugging Face - Blog
Hugging Face - Blog
S
SegmentFault 最新的问题
D
Docker
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
The Cloudflare Blog
量子位
Jina AI
Jina AI
博客园_首页

cs.AI updates on arXiv.org

暂无文章

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen, Nomgondalai Am · 2026-05-22 · via cs.AI updates on arXiv.org

Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.