惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
Stack Overflow Blog
Stack Overflow Blog
MongoDB | Blog
MongoDB | Blog
小众软件
小众软件
U
Unit 42
S
SegmentFault 最新的问题
A
About on SuperTechFans
T
Tailwind CSS Blog
Hugging Face - Blog
Hugging Face - Blog
H
Help Net Security
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
V
Visual Studio Blog
G
Google Developers Blog
The GitHub Blog
The GitHub Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
I
InfoQ
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Y
Y Combinator Blog
博客园 - 司徒正美
量子位
美团技术团队
云风的 BLOG
云风的 BLOG
B
Blog RSS Feed
酷 壳 – CoolShell
酷 壳 – CoolShell
D
Docker
J
Java Code Geeks
B
Blog
L
LangChain Blog
博客园 - 叶小钗
雷峰网
雷峰网
博客园_首页
F
Fortinet All Blogs
Recent Announcements
Recent Announcements
Google DeepMind News
Google DeepMind News
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
有赞技术团队
有赞技术团队
H
Hackread – Cybersecurity News, Data Breaches, AI and More
GbyAI
GbyAI
Blog — PlanetScale
Blog — PlanetScale
Microsoft Azure Blog
Microsoft Azure Blog
阮一峰的网络日志
阮一峰的网络日志
P
Proofpoint News Feed
博客园 - 聂微东
腾讯CDC
T
The Blog of Author Tim Ferriss
罗磊的独立博客
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
博客园 - 三生石上(FineUI控件)

cs.CL updates on arXiv.org

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models Indexing Multimodal Language Models for Large-scale Image Retrieval SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments PersonaVLM: Long-Term Personalized Multimodal LLMs MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors (How) Learning Rates Regulate Catastrophic Overtraining Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports? IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection AgentSPEX: An Agent SPecification and EXecution Language TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding Using reasoning LLMs to extract SDOH events from clinical notes ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Synthesizing Instruction-Tuning Datasets with Contrastive Decoding Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning Foresight Optimization for Strategic Reasoning in Large Language Models Breaking the Generator Barrier: Disentangled Representation for Generalizable AI-Text Detection Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models Learning the Cue or Learning the Word? Analyzing Generalization in Metaphor Detection for Verbs An Empirical Investigation of Practical LLM-as-a-Judge Improvement Techniques on RewardBench 2 Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA QuantileMark: A Message-Symmetric Multi-bit Watermark for LLMs ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution MUSE: Multi-Domain Chinese User Simulation via Self-Evolving Profiles and Rubric-Guided Alignment Robust Reward Modeling for Large Language Models via Causal Decomposition Beyond Static Personas: Situational Personality Steering for Large Language Models Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs Dual-Enhancement Product Bundling: Bridging Interactive Graph and Large Language Model From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution From Weights to Activations: Is Steering the Next Frontier of Adaptation? Interpretable Stylistic Variation in Human and LLM Writing Across Genres, Models, and Decoding Strategies Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation Language steering in latent space to mitigate unintended code-switching ParlaSpeech 3.0: Richly Annotated Spoken Parliamentary Corpora of Croatian, Czech, Polish, and Serbian LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning Exposía: Teaching and Assessment of Academic Writing Skills for Research Project Proposals and Peer Feedback F-Actor: Controllable Conversational Behaviour in Full-Duplex Models Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning Common to Whom? Regional Cultural Commonsense and LLM Bias in India Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs Kwame 2.0: Human-in-the-Loop Generative AI Teaching Assistant for Large Scale Online Coding Education in Africa RAG or Learning? Understanding the Limits of LLM Adaptation under Continuous Knowledge Drift in the Real World ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs Rag Performance Prediction for Question Answering Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration Coherence in the brain unfolds across separable temporal regimes DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs Rhetorical Questions in LLM Representations: A Linear Probing Study TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents Diffusion Language Models for Speech Recognition Reward Design for Physical Reasoning in Vision-Language Models Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks Training-Free Test-Time Contrastive Learning for Large Language Models Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning
Regression Language Models for Code
Yash Akhauri, Xingyou Song, Arissa Wongpanich, Bryan Lewandowski · 2025-10-01 · via cs.CL updates on arXiv.org

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) using a frozen LLM encoder can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM based on T5Gemma, obtains $>$0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves $>$0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.