惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Resident KV Claims: A Conformance Contract for Future Reuse under Active KV Pressure Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Private Adaptive Covariance Estimation via Gaussian Graphical Models Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank Humans Cannot Detect AI-Generated Media But Communities May -- For Now: Collaborative AI Detection in r/RealOrAI on Reddit A lift for input-convex neural network training Bayesian Rational Search Engine User ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment Program Synthesis for Non-Linear Real Arithmetic: Going Beyond Realizability Rubato: Transcribing Piano Music with Timestamps LLMs Show No Signs Of Individuated Metacognition Discovering Lexical Gaps Using Embeddings from Multilingual LLMs Polar: Agentic RL on Any Harness at Scale Toward Enactive Artificial Intelligence Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks Fourier Feature Pyramids for Physics-Informed Neural Networks Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering RxGS: Receiver-Generalizable 3D Gaussian Splatting for Radio-Frequency Data Synthesis CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval Reframing LLM Agent Security as an Agent-Human Interaction Problem CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection Learning regime-dependent governing equations: A symbolic decision tree approach Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning DRInQ: Evaluating Conversational Implicature with Controlled Context Variation Sketch Bug: Using Sketch-Based Input for Interactive Code Debugging Analyzing the Effects of Two-Stage Peer Evaluation Enhancing Reliability in LLM-Based Secure Code Generation Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval How Well Do Models Follow Their Constitutions? An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods An Interactive Paradigm for Deep Research Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Attested Tool-Server Admission: A Security Extension to the Model Context Protocol Can Graph-Based Microservice Performance Detection Be Used for Microservice Intrusion Detection? AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation Deep-Research Agents Can Be Poisoned via User-Generated Content Terrain-Adaptive Grouser Wheel for Optimal Planetary Exploration: Design and Experimental Investigation ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views Cross-Modal Action Recognition in Egocentric Video Using Mamba: Integrating RGB and Hand Skeleton Streams via CLS Token Fusion Strategies GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development How Far Will They Go? Red-Teaming Online Influence with Large Language Models RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text Memorization Dynamics of Fill-in-the-Middle Pretraining A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism Brain-LLM Alignment Tracks Training Data, Not Typology DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic RMA: an Agentic System for Research-Level Mathematical Problems DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization Self-Improving In-Context Learning Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems Foundation Protocol: A Coordination Layer for Agentic Society Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse Design and Report Benchmarks for Knowledge Work ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication Parallel Context Compaction for Long-Horizon LLM Agent Serving Emotion Recognition in Sign Language Conversation Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Cultural Adaptation in Large Language Models for Political Discourse DART: Semantic Recoverability for Structured Tool Agents Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering Beyond Binary Edits Robust Multimodal Knowledge Editing with Adversarial Subspace Alignment How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images
LEARNT: A Practical Estimator for Cardinality of LIKE Queries with Formal Accuracy Guarantees
Hai Lan, Zhi · 2026-05-26 · via cs updates on arXiv.org

View PDF

Abstract:We study the problem of cardinality estimation for LIKE queries on string data, focusing on the most common patterns in real workloads: prefix, suffix, and substring queries. We propose LEARNT, a LIKE query Estimator with Accuracy, Robustness, Negligible overhead, Tunability, and Theoretical guarantees. LEARNT formulates estimation as a bucket-classification problem, and upon correct classification, it yields formal bounds on Q-error for the queries with non-empty answer. It employs a memory-efficient bucketed layered-filter architecture with Bloom filters and compact auxiliary tables, together with optimizations that exploit query skew to reduce storage. For the queries that have empty answer, LEARNT incorporates dedicated filter-based and prefix-walk strategies, providing probabilistic guarantees on correct identification. Furthermore, to support arbitrarily long query strings, we extend LEARNT with Markov modeling scheme that composes short-query statistics into estimates for longer queries. A theoretical framework guides parameter selection to minimize storage under accuracy and robustness constraints. Extensive experiments on four real-world datasets show that LEARNT consistently outperforms state-of-the-art methods such as CLIQUE and LPLM, achieving 1.3-1.7x lower mean Q-error, significantly lower tail errors, and up to 70x faster construction with comparable memory usage.
Comments: 13 pages, 4 figures, 15 tables
Subjects: Databases (cs.DB)
Cite as: arXiv:2605.24308 [cs.DB]
  (or arXiv:2605.24308v1 [cs.DB] for this version)
  https://doi.org/10.48550/arXiv.2605.24308

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hai Lan [view email]
[v1] Sat, 23 May 2026 00:36:47 UTC (581 KB)