惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Privacy International News Feed
Hacker News: Ask HN
Hacker News: Ask HN
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Application and Cybersecurity Blog
Application and Cybersecurity Blog
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
G
GRAHAM CLULEY
W
WeLiveSecurity
H
Heimdal Security Blog
S
Secure Thoughts
L
Lohrmann on Cybersecurity
A
Arctic Wolf
N
News and Events Feed by Topic
Spread Privacy
Spread Privacy
S
Securelist
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Tor Project blog
TaoSecurity Blog
TaoSecurity Blog
MyScale Blog
MyScale Blog
www.infosecurity-magazine.com
www.infosecurity-magazine.com
L
LINUX DO - 热门话题
The GitHub Blog
The GitHub Blog
WordPress大学
WordPress大学
C
CERT Recently Published Vulnerability Notes
大猫的无限游戏
大猫的无限游戏
Project Zero
Project Zero
Google Online Security Blog
Google Online Security Blog
博客园_首页
博客园 - 叶小钗
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Cloudbric
Cloudbric
T
The Blog of Author Tim Ferriss
云风的 BLOG
云风的 BLOG
Cyberwarzone
Cyberwarzone
IT之家
IT之家
Help Net Security
Help Net Security
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
小众软件
小众软件
Last Week in AI
Last Week in AI
Hugging Face - Blog
Hugging Face - Blog
V2EX - 技术
V2EX - 技术
H
Help Net Security
Simon Willison's Weblog
Simon Willison's Weblog
Stack Overflow Blog
Stack Overflow Blog
Cisco Talos Blog
Cisco Talos Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
H
Hackread – Cybersecurity News, Data Breaches, AI and More
GbyAI
GbyAI
NISL@THU
NISL@THU
雷峰网
雷峰网

cs.DB updates on arXiv.org

Block-Sphere Vector Quantization GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction CogScale: Scalable Benchmark for Sequence Processing TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks Gaussian Relational Graph Transformer Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift A Horn extension of DL-Lite with NL data complexity 3D Primitives are a Spatial Language for VLMs Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models Toward Multi-Database Query Reasoning for Text2Cypher Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment Machine Learning-Based Pre-Test Risk Stratification for PCR-Confirmed Chlamydia Using Patient-Reported Data and Urine Biomarkers Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations PrepBench: How Far Are We from Natural-Language-Driven Data Preparation? Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation Building informative materials datasets beyond targeted objectives Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems Inconsistent Databases and Argumentation Frameworks with Collective Attacks Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies FINER-SQL: Boosting Small Language Models for Text-to-SQL Efficient Temporal Datalog Materialisation for Composite Event Recognition EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era A Toolkit for Detecting Spurious Correlations in Speech Datasets SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms Evergreen: Efficient Claim Verification for Semantic Aggregates CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering Health System Scale Semantic Search Across Unstructured Clinical Notes Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification Prior-Aligned Data Cleaning for Tabular Foundation Models Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce How Hard is it to Decide if a Fact is Relevant to a Query? Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks Using ASP(Q) to Handle Inconsistent Prioritized Data A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning PersonalHomeBench: Evaluating Agents in Personalized Smart Homes NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems Credo: Declarative Control of LLM Pipelines via Beliefs and Policies Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Multi-modal panoramic 3D outdoor datasets for place categorization Gypscie: A Cross-Platform AI Artifact Management System ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Graph Query Generation with Constraint-guided Large Language Agents CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data LLM+Graph@VLDB'2025 Workshop Summary Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT) HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness Exploring Urban Land Use Patterns by Pattern Mining and Unsupervised Learning 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints Relational In-Context Learning via Synthetic Pre-training with Structural Prior A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences Sufficient Explanations in Databases and their Connections to Database Repairs Gradient-Based Join Ordering Presenting DiaData for Research on Type 1 Diabetes Factual Inconsistencies in Multilingual Wikipedia Tables MINT: Multi-Vector Search Index Tuning Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL In-depth Analysis of Graph-based RAG in a Unified Framework Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality BEAVER: An Enterprise Benchmark for Text-to-SQL Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries
Jonathan Fürst, Catherine Kosten, Farhad Nooralahzadeh, Yi Zhang · 2024-02-13 · via cs.DB updates on arXiv.org

Text-to-SQL systems (also known as NL-to-SQL systems) have become an increasingly popular solution for bridging the gap between user capabilities and SQL-based data access. These systems translate user requests in natural language to valid SQL statements for a specific database. Recent Text-to-SQL systems have benefited from the rapid improvement of transformer-based language models. However, while Text-to-SQL systems that incorporate such models continuously reach new high scores on -- often synthetic -- benchmark datasets, a systematic exploration of their robustness towards different data models in a real-world, realistic scenario is notably missing. This paper provides the first in-depth evaluation of the data model robustness of Text-to-SQL systems in practice based on a multi-year international project focused on Text-to-SQL interfaces. Our evaluation is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022, during which about 6K natural language questions were asked and executed. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models. For each data model, we explore the performance of representative Text-to-SQL systems and language models. We further quantify the impact of training data size, pre-, and post-processing steps as well as language model inference time. Our comprehensive evaluation sheds light on the design choices of real-world Text-to-SQL systems and their impact on moving from research prototypes to real deployments. Last, we provide a new benchmark dataset to the community, which is the first to enable the evaluation of different data models for the same dataset and is substantially more challenging than most previous datasets in terms of query complexity.