惯性聚合
高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文
在惯性聚合中打开
即将跳转到惯性聚合
3
在聚合应用中查看完整内容和互动
立即跳转
取消
推荐订阅源
NISL@THU
P
Proofpoint News Feed
Help Net Security
Schneier on Security
H
Heimdal Security Blog
V
V2EX - 技术
T
Tor Project blog
N
News and Events Feed by Topic
Google DeepMind News
Simon Willison's Weblog
G
GRAHAM CLULEY
T
Threatpost
T
Tenable Blog
SecWiki News
S
Secure Thoughts
C
Cybersecurity and Infrastructure Security Agency CISA
The Hacker News
K
Kaspersky official blog
博
博客园 - 叶小钗
L
LINUX DO - 最新话题
S
Security Archives - TechRepublic
博
博客园 - Franky
CTFtime.org: upcoming CTF events
The Cloudflare Blog
WordPress大学
博
博客园 - 司徒正美
Scott Helme
Cisco Talos Blog
博
博客园 - 聂微东
E
Exploit-DB.com RSS Feed
Spread Privacy
小众软件
有赞技术团队
cs.CL updates on arXiv.org
Threat Intelligence Blog | Flashpoint
Engineering at Meta
www.infosecurity-magazine.com
Cloudbric
AWS News Blog
N
News and Events Feed by Topic
Y
Y Combinator Blog
L
Lohrmann on Cybersecurity
Vercel News
F
Full Disclosure
J
Java Code Geeks
Latest news
OSCHINA 社区最新新闻
MyScale Blog
D
Docker
H
Help Net Security
Goodfire Research
Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train
Logits as a new monitor for evaluation awareness
The Shape of Stories Inside Neural Networks
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
Can SAEs Capture Neural Geometry?
Steering Along Manifolds to Control Neural Networks
A Geometric Calculator Inside a Neural Network
The Neural Geometry Series
The World Inside Neural Networks
Verbalized Eval Awareness Inflates Measured Safety
Paper Summary: Interpreting Language Model Parameters
Interpreting Language Model Parameters
Probe-Based Data Attribution: Surfacing and Mitigating Undesirable Behaviors in LLM Post-Training
Using Self-Correcting Search to Accelerate Materials Discovery
Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions
Covariance-based Sequence Pooling
Reasoning Theater: Probing for Performative Chain-of-Thought
Features as Rewards: Using Interpretability to Reduce Hallucinations
Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers
Understanding Memorization via Loss Curvature
Deploying Interpretability to Production with Rakuten: SAE Probes for PII Detection
Interpreting Evo 2: Arc Institute's Next-Generation Genomic Foundation Model
Mapping the Latent Space of Llama 3.3 70B
Understanding and Steering Llama 3 with Sparse Autoencoders
Discovering Undesired Rare Behaviors via Model Diff Amplification
Open Problems in Mechanistic Interpretability
Understanding Sparse Autoencoder Scaling in the Presence of Feature Manifolds
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
Priors in Time: Missing Inductive Biases for Language Model Interpretability
Adversarial Examples Are Not Bugs, They Are Superposition
Painting With Concepts Using Diffusion Model Latents
Under the Hood of a Reasoning Model
Finding the Tree of Life in Evo 2
The Circuits Research Landscape: Results and Perspectives
Towards Scalable Parameter Decomposition
Replicating Circuit Tracing for a Simple Known Mechanism
Predicting Rare LLM Failures with 30× Fewer Rollouts
Santiago Aranguri, Francisco Pernice,
·
2026-06-11
·
via
Goodfire Research
Predicting Rare LLM Failures with 30× Fewer Rollouts
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。
原文来自
— 版权归原作者所有。