惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
月光博客
月光博客
Engineering at Meta
Engineering at Meta
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
Y
Y Combinator Blog
P
Privacy International News Feed
博客园 - 三生石上(FineUI控件)
T
Threat Research - Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
MyScale Blog
MyScale Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
D
Docker
C
Cisco Blogs
N
Netflix TechBlog - Medium
S
Security @ Cisco Blogs
GbyAI
GbyAI
V
V2EX - 技术
C
Cybersecurity and Infrastructure Security Agency CISA
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
F
Full Disclosure
T
Tor Project blog
W
WeLiveSecurity
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Latest news
Latest news
Forbes - Security
Forbes - Security
The GitHub Blog
The GitHub Blog
T
Troy Hunt's Blog
博客园 - 司徒正美
C
CXSECURITY Database RSS Feed - CXSecurity.com
P
Proofpoint News Feed
I
InfoQ
NISL@THU
NISL@THU
B
Blog
Hacker News: Ask HN
Hacker News: Ask HN
爱范儿
爱范儿
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
V2EX
博客园 - 聂微东
H
Heimdal Security Blog
宝玉的分享
宝玉的分享
IT之家
IT之家
aimingoo的专栏
aimingoo的专栏
Project Zero
Project Zero
C
Comments on: Blog
M
MIT News - Artificial intelligence
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog

Goodfire Research

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train Logits as a new monitor for evaluation awareness Predicting Rare LLM Failures with 30× Fewer Rollouts The Shape of Stories Inside Neural Networks Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Can SAEs Capture Neural Geometry? Steering Along Manifolds to Control Neural Networks A Geometric Calculator Inside a Neural Network The Neural Geometry Series The World Inside Neural Networks Verbalized Eval Awareness Inflates Measured Safety Paper Summary: Interpreting Language Model Parameters Interpreting Language Model Parameters Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training Using Self-Correcting Search to Accelerate Materials Discovery Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions Covariance-based Sequence Pooling Reasoning Theater: Probing for Performative Chain-of-Thought Features as Rewards: Using Interpretability to Reduce Hallucinations Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers Understanding Memorization via Loss Curvature Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model Mapping the Latent Space of Llama 3.3 70B Understanding and Steering Llama 3 with Sparse Autoencoders Discovering Undesired Rare Behaviors via Model Diff Amplification Open Problems in Mechanistic Interpretability Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering Priors in Time: Missing Inductive Biases for Language Model Interpretability Adversarial Examples Are Not Bugs, They Are Superposition Painting With Concepts Using Diffusion Model Latents Under the Hood of a Reasoning Model Finding the Tree of Life in Evo 2 The Circuits Research Landscape: Results and Perspectives Towards Scalable Parameter Decomposition
Replicating Circuit Tracing for a Simple Known Mechanism
Jack Merullo * · 2025-11-29 · via Goodfire Research
Recent work by Ameisen et al. and Lindsey et al. introduced Cross-Layer Transcoders (CLTs) as a method for ch…