惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

大猫的无限游戏
大猫的无限游戏
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
AWS News Blog
AWS News Blog
V
V2EX - 技术
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cloudbric
Cloudbric
S
Securelist
L
LINUX DO - 最新话题
Scott Helme
Scott Helme
T
Threat Research - Cisco Blogs
S
Schneier on Security
Simon Willison's Weblog
Simon Willison's Weblog
G
GRAHAM CLULEY
I
Intezer
C
Cybersecurity and Infrastructure Security Agency CISA
C
CERT Recently Published Vulnerability Notes
SecWiki News
SecWiki News
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
TaoSecurity Blog
TaoSecurity Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Attack and Defense Labs
Attack and Defense Labs
S
Security Affairs
D
Docker
The Cloudflare Blog
博客园 - 三生石上(FineUI控件)
爱范儿
爱范儿
美团技术团队
W
WeLiveSecurity
阮一峰的网络日志
阮一峰的网络日志
月光博客
月光博客
Recent Commits to openclaw:main
Recent Commits to openclaw:main
博客园_首页
G
Google Developers Blog
C
Cisco Blogs
T
Tor Project blog
B
Blog RSS Feed
Vercel News
Vercel News
宝玉的分享
宝玉的分享
Recorded Future
Recorded Future
Cisco Talos Blog
Cisco Talos Blog
P
Palo Alto Networks Blog
Application and Cybersecurity Blog
Application and Cybersecurity Blog
E
Exploit-DB.com RSS Feed
PCI Perspectives
PCI Perspectives
K
Kaspersky official blog
量子位
Google Online Security Blog
Google Online Security Blog
Jina AI
Jina AI
Hacker News - Newest:
Hacker News - Newest: "LLM"
aimingoo的专栏
aimingoo的专栏

cs.CL updates on arXiv.org

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems One RL to See Them All: Visual Triple Unified Reinforcement Learning VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning AdaSplash-2: Faster Differentiable Sparse Attention Decoupling Scores and Text: The Politeness Principle in Peer Review Correcting Suppressed Log-Probabilities in Language Models with Post-Transformer Adapters MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models Rethinking Patient Education as Multi-turn Multi-modal Interaction Indexing Multimodal Language Models for Large-scale Image Retrieval SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments PersonaVLM: Long-Term Personalized Multimodal LLMs MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning? Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows (How) Learning Rates Regulate Catastrophic Overtraining Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports? IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection AgentSPEX: An Agent SPecification and EXecution Language TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding Using reasoning LLMs to extract SDOH events from clinical notes ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Synthesizing Instruction-Tuning Datasets with Contrastive Decoding Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning Foresight Optimization for Strategic Reasoning in Large Language Models Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs An Empirical Investigation of Practical LLM-as-a-Judge Improvement Techniques on RewardBench 2 Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution MUSE: Multi-Domain Chinese User Simulation via Self-Evolving Profiles and Rubric-Guided Alignment Robust Reward Modeling for Large Language Models via Causal Decomposition Beyond Static Personas: Situational Personality Steering for Large Language Models Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution From Weights to Activations: Is Steering the Next Frontier of Adaptation? Interpretable Stylistic Variation in Human and LLM Writing Across Genres, Models, and Decoding Strategies Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder Language steering in latent space to mitigate unintended code-switching ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech, Polish, and Serbian LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning Exposía: Teaching and Assessment of Academic Writing Skills for Research Project Proposals and Peer Feedback F-Actor: Controllable Conversational Behaviour in Full-Duplex Models Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Common to Whom? Regional Cultural Commonsense and LLM Bias in India Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs Kwame 2.0: Human-in-the-Loop Generative AI Teaching Assistant for Large Scale Online Coding Education in Africa RAG or Learning? Understanding the Limits of LLM Adaptation under Continuous Knowledge Drift in the Real World ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs Rag Performance Prediction for Question Answering Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning Coherence in the brain unfolds across separable temporal regimes The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
When Roleplaying, Do Models Believe What They Say?
Benjamin Sturgeon, David Africa, Sid Black · 2026-06-10 · via cs.CL updates on arXiv.org

Language models can state that "the Earth orbits the Sun" and, when role-playing Aristotle, assert the opposite. Recent work argues that persona adoption is fundamental to how language models operate, with models constantly selecting the most appropriate persona for a given context. Does such role-playing merely change the model's outputs, or does it also affect what the model internally represents as truthful? We study this question with linear truth probes, applying them to LLMs role-playing historical personas whose likely beliefs differ from modern consensus. For each persona, we compare false claims the persona would likely have endorsed (*era-believed*) with topic-matched false claims they would not have endorsed (*era-false*). Across prompting, in-context learning, and supervised fine-tuning, persona induction suppresses era-believed statements less than equally false alternatives, yet they remain classified as false overall. Role-play therefore shifts what these models say more than what they internally represent as true. We contrast this with models trained on harmful advice that exhibit Emergent Misalignment (EM). Across three model families (Qwen 2.5 14B, Qwen 3 8B, and Llama 3.3 70B), their false claims move substantially toward the true region of probe space, are defended under challenge roughly half the time versus about a sixth for role-play, and are used in downstream reasoning. Role-play and Emergent Misalignment thus are points on a spectrum of belief internalization, where role-play changes what a model says with little representational change, while Emergent Misalignment shifts the internal representation of false claims without fully marking them as true.