惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.AI updates on arXiv.org

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters Transforming Constraint Programs to Input for Local Search HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German From SGD to Muon: Adaptive Optimization via Schatten-p Norms Beyond Rational Illusion: Behaviorally Realistic Strategic Classification Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference Memory-Augmented Reinforcement Learning Agent for CAD Generation Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints EmbGen: Teaching with Reassembled Corpora HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding Harnessing Self-Supervised Features for Art Classification When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation FormalASR: End-to-End Spoken Chinese to Formal Text Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection Base Models Look Human To AI Detectors BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects AgentNLQ: A General-Purpose Agent for Natural Language to SQL Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models Efficient Elicitation of Collective Disagreements LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance Targeted Downstream-Agnostic Attack PhyWorld: Physics-Faithful World Model for Video Generation Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version) EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents Multi-Scale Generative Modeling with Heat Dissipation Flow Matching Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use OpenComputer: Verifiable Software Worlds for Computer-Use Agents Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis Not all uncertainty is alike: volatility, stochasticity, and exploration Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs Generative Auto-Bidding with Unified Modeling and Exploration Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges optimize_anything: A Universal API for Optimizing any Text Parameter TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents CogScale: Scalable Benchmark for Sequence Processing Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings Evaluating the Utility of Personal Health Records in Personalized Health AI Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting Synthesis and Evaluation of Long-term History-aware Medical Dialogue Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution Interference-Aware Multi-Task Unlearning Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering How Far Are We From True Auto-Research? Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On Agentic Trading: When LLM Agents Meet Financial Markets LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation Generative Recursive Reasoning GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction Hallucination as Exploit: Evidence-Carrying Multimodal Agents Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models
Quantifying the Generalization Gap in Seizure Detection: A Large-Scale Empirical Benchmark via the SzCORE Challenge
Jonathan Dan · 2026-05-20 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Reliable automatic seizure detection from long-term electroencephalography (EEG) remains an unsolved challenge, as current models often fail to generalize across patients or clinical settings. Manual EEG review still is the standard of care, highlighting the need for robust models and standardized evaluation. The current literature often reports high efficacy, yet these models frequently fail when deployed to unseen patient populations. To rigorously assess this generalization gap, we conducted a large-scale empirical study evaluating 28 state-of-the-art algorithmic architectures, ranging from classical feature engineering to modern Deep Learning. These algorithms were collected by organizing a competition. A strictly held-out private dataset of continuous EEG recordings from 65 subjects, totaling 4,360 hours of data, was utilized to evaluate algorithm performance. Expert neurophysiologists annotated these recordings, establishing the ground truth for seizure events. Algorithms were evaluated using event-based metrics from the SzCORE framework, including sensitivity, precision, F1-score, and false positive rate per day. Results revealed significant performance variability among state-of-the-art approaches, with the top F1 score of 32% (sensitivity 37%, precision 29%), highlighting the persistent difficulty of this task. Analysis uncovered a discordance between peak performance and population-level stability. The algorithms achieving the highest aggregate F1-scores did not achieve the most consistent ranking across subjects. This independent evaluation exposed a notable gap between self-reported efficacies and hold-out performance, underscoring the critical need for standardized, rigorous benchmarking. The evaluation infrastructure transitions into a continuously open benchmarking platform, fostering reproducible research and accelerating robust seizure detection algorithm development.
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
Cite as: arXiv:2505.18191 [eess.SP]
  (or arXiv:2505.18191v2 [eess.SP] for this version)
  https://doi.org/10.48550/arXiv.2505.18191

arXiv-issued DOI via DataCite

Submission history

From: Jonathan Dan [view email]
[v1] Mon, 19 May 2025 17:36:20 UTC (1,333 KB)
[v2] Mon, 18 May 2026 18:45:07 UTC (1,355 KB)