惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
人人都是产品经理
人人都是产品经理
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
有赞技术团队
有赞技术团队
博客园 - 聂微东
C
Cybersecurity and Infrastructure Security Agency CISA
S
SegmentFault 最新的问题
博客园_首页
I
InfoQ
A
About on SuperTechFans
Apple Machine Learning Research
Apple Machine Learning Research
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
美团技术团队
T
Tor Project blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
V
Visual Studio Blog
WordPress大学
WordPress大学
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Tailwind CSS Blog
P
Palo Alto Networks Blog
博客园 - 叶小钗
N
News and Events Feed by Topic
Google DeepMind News
Google DeepMind News
Last Week in AI
Last Week in AI
小众软件
小众软件
N
News and Events Feed by Topic
Spread Privacy
Spread Privacy
O
OpenAI News
N
News | PayPal Newsroom
H
Help Net Security
Recent Announcements
Recent Announcements
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
酷 壳 – CoolShell
酷 壳 – CoolShell
PCI Perspectives
PCI Perspectives
M
MIT News - Artificial intelligence
云风的 BLOG
云风的 BLOG
罗磊的独立博客
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The GitHub Blog
The GitHub Blog
Google Online Security Blog
Google Online Security Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
IT之家
IT之家
Y
Y Combinator Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - 【当耐特】
T
The Blog of Author Tim Ferriss
AWS News Blog
AWS News Blog
W
WeLiveSecurity
www.infosecurity-magazine.com
www.infosecurity-magazine.com
NISL@THU
NISL@THU

cs.CL updates on arXiv.org

Indexing Multimodal Language Models for Large-scale Image Retrieval SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments PersonaVLM: Long-Term Personalized Multimodal LLMs MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning? Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows (How) Learning Rates Regulate Catastrophic Overtraining Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin Red Skills or Blue Skills? A Dive Into Skills Published on ClawHub Can Large Language Models Reliably Extract Physiology Index Values from Coronary Angiography Reports? IWLV-Ramayana: A Sarga-Aligned Parallel Corpus of Valmiki's Ramayana Across Indian Languages Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection AgentSPEX: An Agent SPecification and EXecution Language TLoRA+: A Low-Rank Parameter-Efficient Fine-Tuning Method for Large Language Models Empirical Evidence of Complexity-Induced Limits in Large Language Models on Finite Discrete State-Space Problems with Explicit Validity Constraints CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding Using reasoning LLMs to extract SDOH events from clinical notes ToolSpec: Accelerating Tool Calling via Schema-Aware and Retrieval-Augmented Speculative Decoding Synthesizing Instruction-Tuning Datasets with Contrastive Decoding Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning Foresight Optimization for Strategic Reasoning in Large Language Models From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Reward Design for Physical Reasoning in Vision-Language Models IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks Training-Free Test-Time Contrastive Learning for Large Language Models Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning Peer-Predictive Self-Training for Language Model Reasoning Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations Self-Calibrating Language Models via Test-Time Discriminative Distillation HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling Human vs. Machine Deception: Distinguishing AI-Generated and Human-Written Fake News Using Ensemble Learning Weird Generalization is Weirdly Brittle Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text Relational Probing: LM-to-Graph Adaptation for Financial Prediction CodeComp: Structural KV Cache Compression for Agentic Coding FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness Comparative Analysis of Large Language Models in Healthcare Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation A Structured Clustering Approach for Inducing Media Narratives NameBERT: Scaling Name-Based Nationality Classification with LLM-Augmented Open Academic Data LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection Turing or Cantor: That is the Question NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning Instruction Data Selection via Answer Divergence EviCare: Enhancing Diagnosis Prediction with Deep Model-Guided Evidence for In-Context Reasoning Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval BlasBench: An Open Benchmark for Irish Speech Recognition OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts Evaluating Memory Capability in Continuous Lifelog Scenario Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Hidden Measurement Error in LLM Pipelines Distorts Annotation, Evaluation, and Benchmarking LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference What Factors Affect LLMs and RLLMs in Financial Question Answering?
A Study of In-Context-Learning-Based Text-to-SQL Errors
Jiawei Shen, Chengcheng Wan, Ruoyi Qiao, Jiazhen Zou, Hang Xu, Y · 2025-01-16 · via cs.CL updates on arXiv.org

Large language models (LLMs) have been adopted to perform text-to-SQL tasks, utilizing their in-context learning (ICL) capability to translate natural language questions into structured query language (SQL). However, such a technique faces correctness problems and requires efficient repairing solutions. In this paper, we conduct the first comprehensive study of text-to-SQL errors. Our study covers four representative ICL-based techniques, five basic repairing methods, two benchmarks, and two LLM settings. We find that text-to-SQL errors are widespread and summarize 29 error types of 7 categories. We also find that existing repairing attempts have limited correctness improvement at the cost of high computational overhead with many mis-repairs. Based on the findings, we propose MapleRepair, a novel text-to-SQL error detection and repairing framework. The evaluation demonstrates that MapleRepair outperforms existing solutions by repairing 13.8% more queries with neglectable mis-repairs and 67.4% less overhead.