惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

End-to-End Intracortical Speech Decoding from Neural Activity Sketch Bug: Using Sketch-Based Input for Interactive Code Debugging AvAtar: Learning to Align via Active Optimal Transport Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning RxGS: Receiver-Generalizable 3D Gaussian Splatting for Radio-Frequency Data Synthesis Resident KV Claims: A Conformance Contract for Future Reuse under Active KV Pressure Phonetic Modeling of Dialectal Variation in Vietnamese Speech Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Momentum Streams for Optimizer-Inspired Transformers Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection On Permutation Groups of Cyclic Codes over Finite Fields Bayesian Rational Search Engine User Interdomain Attention: Beyond Token-Level Key-Value Memory LLMs Show No Signs Of Individuated Metacognition Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification Generative OOD-regularized Model-based Policy Optimization PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing Program Synthesis for Non-Linear Real Arithmetic: Going Beyond Realizability How Well Do Models Follow Their Constitutions? When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification LEARNT: A Practical Estimator for Cardinality of LIKE Queries with Formal Accuracy Guarantees Improving the Accuracy of the Exponentially Fitted Scheme on Piecewise Uniform Meshes Humans Cannot Detect AI-Generated Media But Communities May -- For Now: Collaborative AI Detection in r/RealOrAI on Reddit Deep-Research Agents Can Be Poisoned via User-Generated Content Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Discovering Lexical Gaps Using Embeddings from Multilingual LLMs TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning Fourier Feature Pyramids for Physics-Informed Neural Networks Side-by-side Comparison Amplifies Dialect Bias in Language Models Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Representation-Guided Discrete Molecular Graph Retrosynthesis Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval Learning regime-dependent governing equations: A symbolic decision tree approach Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Attested Tool-Server Admission: A Security Extension to the Model Context Protocol Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering Can Graph-Based Microservice Performance Detection Be Used for Microservice Intrusion Detection? ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views FoodMonitor: Benchmarking MLLMs for Explainable Compliance Analysis Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank Cross-Modal Action Recognition in Egocentric Video Using Mamba: Integrating RGB and Hand Skeleton Streams via CLS Token Fusion Strategies A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval SEAL: Synergistic Co-Evolution of Agents and Learning Environments DRInQ: Evaluating Conversational Implicature with Controlled Context Variation ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Private Adaptive Covariance Estimation via Gaussian Graphical Models A lift for input-convex neural network training Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making Refined Analysis of Entropy-Regularized Actor-Critic ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence An Interactive Paradigm for Deep Research A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation Assessing the Operational Viability of Foundation Models for Time Series Forecasting Batch Normalization Amplifies Memorization and Privacy Risks The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling CAffNet: Hard Constraint-Affine Neural Networks ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Enhancing Reliability in LLM-Based Secure Code Generation SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems Asymmetric Adaptation-based Real-time Fault Diagnosis Under Transitional Operating Conditions Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content Rubato: Transcribing Piano Music with Timestamps Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Analyzing the Effects of Two-Stage Peer Evaluation Toward Enactive Artificial Intelligence MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval Polar: Agentic RL on Any Harness at Scale Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks
EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026
Zhiheng Fu, · 2026-05-26 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:The EPIC-KITCHENS-100 Action Detection challenge evaluates whether a model can localize the start and end of each action in long untrimmed egocentric videos and assign the corresponding verb--noun action label. In this report, we formulate our submission as EgoAction (Egocentric Action Composition with Reliability-Aware Temporal Fusion), a unified decoupled detection and fusion pipeline. The pipeline uses EPIC-finetuned VideoMAE-L features, trains separate noun and verb temporal detectors with causal temporal modeling, composes action hypotheses from top noun--verb pairs, and introduces a confidence-adaptive boundary fusion rule at post-processing time. The key observation is that verb and noun streams often fail differently: verb scores are sensitive to motion transitions, whereas noun scores are sensitive to hand-object visibility and object clutter. A fixed arithmetic mean of their predicted boundaries can therefore amplify localization errors when one stream degenerates. We replace this hard-coded mean with Dynamic Weighted Fusion (DWF), which normalizes the maximum noun and verb classification confidences into proposal-wise boundary weights and linearly combines the two intervals. This lightweight tensor-only operator shifts boundary authority toward the more reliable stream while preserving the decoupled action scoring mechanism. Together with sliding-window inference, top-K noun--verb action composition, and class-wise Soft-NMS, EgoAction provides a compact and reproducible system for egocentric temporal action detection.
Comments: Technical Report for CVPR 2026 EPIC-KITCHENS-100 Action Detection Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.24496 [cs.CV]
  (or arXiv:2605.24496v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.24496

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zhiheng Fu [view email]
[v1] Sat, 23 May 2026 10:05:56 UTC (260 KB)