惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
月光博客
月光博客
Engineering at Meta
Engineering at Meta
Attack and Defense Labs
Attack and Defense Labs
G
GRAHAM CLULEY
Y
Y Combinator Blog
P
Privacy International News Feed
博客园 - 三生石上(FineUI控件)
T
Threat Research - Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
MyScale Blog
MyScale Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
D
Docker
C
Cisco Blogs
N
Netflix TechBlog - Medium
S
Security @ Cisco Blogs
GbyAI
GbyAI
V
V2EX - 技术
C
Cybersecurity and Infrastructure Security Agency CISA
Cyberwarzone
Cyberwarzone
MongoDB | Blog
MongoDB | Blog
F
Full Disclosure
T
Tor Project blog
W
WeLiveSecurity
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Latest news
Latest news
Forbes - Security
Forbes - Security
The GitHub Blog
The GitHub Blog
T
Troy Hunt's Blog
博客园 - 司徒正美
C
CXSECURITY Database RSS Feed - CXSecurity.com
P
Proofpoint News Feed
I
InfoQ
NISL@THU
NISL@THU
B
Blog
Hacker News: Ask HN
Hacker News: Ask HN
爱范儿
爱范儿
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
V2EX
博客园 - 聂微东
H
Heimdal Security Blog
宝玉的分享
宝玉的分享
IT之家
IT之家
aimingoo的专栏
aimingoo的专栏
Project Zero
Project Zero
C
Comments on: Blog
M
MIT News - Artificial intelligence
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
The Cloudflare Blog

Goodfire Research

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train Logits as a new monitor for evaluation awareness Predicting Rare LLM Failures with 30× Fewer Rollouts The Shape of Stories Inside Neural Networks Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention Can SAEs Capture Neural Geometry? Steering Along Manifolds to Control Neural Networks A Geometric Calculator Inside a Neural Network The Neural Geometry Series The World Inside Neural Networks Verbalized Eval Awareness Inflates Measured Safety Paper Summary: Interpreting Language Model Parameters Interpreting Language Model Parameters Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training Using Self-Correcting Search to Accelerate Materials Discovery Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions Covariance-based Sequence Pooling Reasoning Theater: Probing for Performative Chain-of-Thought Features as Rewards: Using Interpretability to Reduce Hallucinations Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers Understanding Memorization via Loss Curvature Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model Mapping the Latent Space of Llama 3.3 70B Discovering Undesired Rare Behaviors via Model Diff Amplification Open Problems in Mechanistic Interpretability Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering Priors in Time: Missing Inductive Biases for Language Model Interpretability Adversarial Examples Are Not Bugs, They Are Superposition Painting With Concepts Using Diffusion Model Latents Under the Hood of a Reasoning Model Finding the Tree of Life in Evo 2 The Circuits Research Landscape: Results and Perspectives Towards Scalable Parameter Decomposition Replicating Circuit Tracing for a Simple Known Mechanism
Understanding and Steering Llama 3 with Sparse Autoencoders
Thomas McGrath* · 2026-02-05 · via Goodfire Research
Understanding and Steering Llama 3 with Sparse Autoencoders