惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google Online Security Blog
Google Online Security Blog
T
Threat Research - Cisco Blogs
G
GRAHAM CLULEY
AWS News Blog
AWS News Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
I
Intezer
A
Arctic Wolf
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
CERT Recently Published Vulnerability Notes
The Register - Security
The Register - Security
L
LangChain Blog
B
Blog
G
Google Developers Blog
K
Kaspersky official blog
T
Tenable Blog
S
Securelist
C
CXSECURITY Database RSS Feed - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
I
InfoQ
P
Palo Alto Networks Blog
NISL@THU
NISL@THU
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Stack Overflow Blog
Stack Overflow Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
S
Secure Thoughts
D
Docker
雷峰网
雷峰网
The Last Watchdog
The Last Watchdog
S
SegmentFault 最新的问题
Webroot Blog
Webroot Blog
月光博客
月光博客
美团技术团队
Cyberwarzone
Cyberwarzone
腾讯CDC
F
Full Disclosure
Scott Helme
Scott Helme
量子位
The Cloudflare Blog
C
Comments on: Blog
PCI Perspectives
PCI Perspectives
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
有赞技术团队
有赞技术团队
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tor Project blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
博客园 - 【当耐特】
S
Schneier on Security
P
Proofpoint News Feed
Security Latest
Security Latest

cs.CL updates on arXiv.org

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations Self-Calibrating Language Models via Test-Time Discriminative Distillation HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation Generating High Quality Synthetic Data for Dutch Medical Conversations GIANTS: Generative Insight Anticipation from Scientific Literature Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Should We be Pedantic About Reasoning Errors in Machine Translation? Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Weird Generalization is Weirdly Brittle Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry CircuitSynth: Reliable Synthetic Data Generation Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Relational Probing: LM-to-Graph Adaptation for Financial Prediction CodeComp: Structural KV Cache Compression for Agentic Coding FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness Comparative Analysis of Large Language Models in Healthcare Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation A Structured Clustering Approach for Inducing Media Narratives NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection Turing or Cantor: That is the Question CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization LLMs Should Incorporate Explicit Mechanisms for Human Empathy Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment BlasBench: An Open Benchmark for Irish Speech Recognition TInR: Exploring Tool-Internalized Reasoning in Large Language Models OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Evaluating Memory Capability in Continuous Lifelog Scenario Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking A Triadic Suffix Tokenization Scheme for Numerical Reasoning Evaluating Cooperation in LLM Social Groups through Elected Leadership LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval Seven simple steps for log analysis in AI systems LETGAMES: An LLM-Powered Gamified Approach to Cognitive Training for Patients with Cognitive Impairment Generative UI: LLMs are Effective UI Generators LABBench2: An Improved Benchmark for AI Systems Performing Biology Research ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models COMPOSITE-Stem Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards Cross-Cultural Value Awareness in Large Vision-Language Models Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution The Amazing Agent Race: Strong Tool Users, Weak Navigators SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference What Factors Affect LLMs and RLLMs in Financial Question Answering? Echoes of Automation: The Increasing Use of LLMs in Newsmaking KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling Preference Learning Unlocks LLMs' Psycho-Counseling Skills FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models Aligning What LLMs Do and Say: Towards Self-Consistent Explanations StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation Disco-RAG: Discourse-Aware Retrieval-Augmented Generation GenProve: Learning to Generate Text with Fine-Grained Provenance Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation ChemPro: A Progressive Chemistry Benchmark for Large Language Models ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Reasoning Models Will Sometimes Lie About Their Reasoning Linear Representations of Hierarchical Concepts in Language Models H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
"I didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration
Eunsu Kim, J · 2026-05-21 · via cs.CL updates on arXiv.org

View PDF HTML (experimental)

Abstract:As large language models (LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions in human-AI collaboration becomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce a goal-level attribution framework, CoTrace, that decomposes explicit goals into verifiable requirements and traces both direct contributions and indirect influences across dialogue turns. Applying CoTrace to 638 real-world collaboration logs, we find that while models account for only 11-26% of goal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Through controlled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In a user study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2605.21363 [cs.CL]
  (or arXiv:2605.21363v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2605.21363

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Eunsu Kim [view email]
[v1] Wed, 20 May 2026 16:28:34 UTC (9,826 KB)