惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Vercel News
Vercel News
SecWiki News
SecWiki News
WordPress大学
WordPress大学
小众软件
小众软件
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
V
Visual Studio Blog
Y
Y Combinator Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
云风的 BLOG
云风的 BLOG
MyScale Blog
MyScale Blog
K
Kaspersky official blog
T
The Exploit Database - CXSecurity.com
腾讯CDC
Scott Helme
Scott Helme
I
InfoQ
Cyberwarzone
Cyberwarzone
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Security Latest
Security Latest
The Register - Security
The Register - Security
Project Zero
Project Zero
F
Fortinet All Blogs
C
CERT Recently Published Vulnerability Notes
A
Arctic Wolf
C
Cisco Blogs
L
LINUX DO - 热门话题
P
Privacy International News Feed
IT之家
IT之家
U
Unit 42
P
Privacy & Cybersecurity Law Blog
H
Help Net Security
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
C
Cyber Attacks, Cyber Crime and Cyber Security
P
Palo Alto Networks Blog
F
Full Disclosure
宝玉的分享
宝玉的分享
Simon Willison's Weblog
Simon Willison's Weblog
L
Lohrmann on Cybersecurity
Google DeepMind News
Google DeepMind News
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
H
Hacker News: Front Page
Know Your Adversary
Know Your Adversary
PCI Perspectives
PCI Perspectives
Hugging Face - Blog
Hugging Face - Blog
AWS News Blog
AWS News Blog
MongoDB | Blog
MongoDB | Blog
S
Schneier on Security
Recent Announcements
Recent Announcements
Forbes - Security
Forbes - Security
Cisco Talos Blog
Cisco Talos Blog

cs.DB updates on arXiv.org

Block-Sphere Vector Quantization GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction CogScale: Scalable Benchmark for Sequence Processing TextAlign: Preference Alignment for Text Rendering with Hierarchical Rewards LogRouter: Adaptive Two-Level LLM Routing for Log Question Answering in Big Data Systems Agentic Cost-Aware Query Planning with Knowledge Distillation for Big Data Analytics Covariance Structure and Coordinate Heterogeneity Govern Binary Quantization of Contrastive Embeddings IVF-TQ: Calibration-Free Streaming Vector Search via a Codebook-Free Residual Layer Automatic Unsupervised Ensemble Outlier Model Selection--Extended Version A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks Gaussian Relational Graph Transformer Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift A Horn extension of DL-Lite with NL data complexity 3D Primitives are a Spatial Language for VLMs Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination EpiCastBench: Datasets and Benchmarks for Multivariate Epidemic Forecasting FERMI: Exploiting Relations for Membership Inference Against Tabular Diffusion Models Toward Multi-Database Query Reasoning for Text2Cypher Autonomous FAIR Digital Objects: From Passive Assertions to Active Knowledge HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities Detect, Localize, and Explain: Interactive Hierarchical Log Anomaly Analytics with LLM Augmentation Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment Machine Learning-Based Pre-Test Risk Stratification for PCR-Confirmed Chlamydia Using Patient-Reported Data and Urine Biomarkers Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations PrepBench: How Far Are We from Natural-Language-Driven Data Preparation? Anatomy of a Query: W5H Dimensions and FAR Patterns for Text-to-SQL Evaluation Building informative materials datasets beyond targeted objectives Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems Inconsistent Databases and Argumentation Frameworks with Collective Attacks Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies FINER-SQL: Boosting Small Language Models for Text-to-SQL Efficient Temporal Datalog Materialisation for Composite Event Recognition EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era A Toolkit for Detecting Spurious Correlations in Speech Datasets SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms Evergreen: Efficient Claim Verification for Semantic Aggregates CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering Health System Scale Semantic Search Across Unstructured Clinical Notes Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification Prior-Aligned Data Cleaning for Tabular Foundation Models Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce How Hard is it to Decide if a Fact is Relevant to a Query? Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks Using ASP(Q) to Handle Inconsistent Prioritized Data A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation Pre-Execution Query Slot-Time Prediction in Cloud Data Warehouses: A Feature-Scoped Machine Learning Approach Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning PersonalHomeBench: Evaluating Agents in Personalized Smart Homes NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems Credo: Declarative Control of LLM Pipelines via Beliefs and Policies Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages Multi-modal panoramic 3D outdoor datasets for place categorization Gypscie: A Cross-Platform AI Artifact Management System ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification Graph Query Generation with Constraint-guided Large Language Agents CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data LLM+Graph@VLDB'2025 Workshop Summary Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT) HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness Exploring Urban Land Use Patterns by Pattern Mining and Unsupervised Learning 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints Relational In-Context Learning via Synthetic Pre-training with Structural Prior A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences Sufficient Explanations in Databases and their Connections to Database Repairs Gradient-Based Join Ordering Presenting DiaData for Research on Type 1 Diabetes Factual Inconsistencies in Multilingual Wikipedia Tables MINT: Multi-Vector Search Index Tuning Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL In-depth Analysis of Graph-based RAG in a Unified Framework Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality BEAVER: An Enterprise Benchmark for Text-to-SQL Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments
Bag Semantics Query Containment: The CQ vs. UCQ Case and Other Stories
Jerzy Marcinkowski, Piotr Ostropolski-Nalewaja · 2025-03-10 · via cs.DB updates on arXiv.org

Query Containment Problem (QCP) is a fundamental decision problem in query processing and optimization. While QCP has for a long time been completely understood for the case of set semantics, decidability of QCP for conjunctive queries under multi-set semantics ($QCP_{\text{CQ}}^{\text{bag}}$) remains one of the most intriguing open problems in database theory. Certain effort has been put, in last 30 years, to solve this problem and some decidable special cases of $QCP_{\text{CQ}}^{\text{bag}}$ were identified, as well as some undecidable extensions, including $QCP_{\text{UCQ}}^{\text{bag}}$. In this paper we introduce a new technique which produces, for a given UCQ $Φ$, a CQ $φ$ such that the application of $φ$ to a database $D$ is, in some sense, an approximation of the application of $Φ$ to $D$. Using this technique we could analyze the status of $QCP^{\text{bag}}$ when one of the queries in question is a CQ and the other is a UCQ, and we reached conclusions which surprised us a little bit. We also tried to use this technique to translate the known undecidability proof for $QCP_{\text{UCQ}}^{\text{bag}}$ into a proof of undecidability of $QCP_{\text{CQ}}^{\text{bag}}$. And, as you are going to see, we got stopped just one infinitely small $\varepsilon$ before reaching this ultimate goal.