惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs updates on arXiv.org

End-to-End Intracortical Speech Decoding from Neural Activity Sketch Bug: Using Sketch-Based Input for Interactive Code Debugging AvAtar: Learning to Align via Active Optimal Transport Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning RxGS: Receiver-Generalizable 3D Gaussian Splatting for Radio-Frequency Data Synthesis Resident KV Claims: A Conformance Contract for Future Reuse under Active KV Pressure Phonetic Modeling of Dialectal Variation in Vietnamese Speech Concept Drift Adaptation Using Self-Supervised and Reinforcement Learning In Android Malware Detection Momentum Streams for Optimizer-Inspired Transformers Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection On Permutation Groups of Cyclic Codes over Finite Fields Bayesian Rational Search Engine User Interdomain Attention: Beyond Token-Level Key-Value Memory LLMs Show No Signs Of Individuated Metacognition Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification Generative OOD-regularized Model-based Policy Optimization PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing Program Synthesis for Non-Linear Real Arithmetic: Going Beyond Realizability How Well Do Models Follow Their Constitutions? When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification LEARNT: A Practical Estimator for Cardinality of LIKE Queries with Formal Accuracy Guarantees Improving the Accuracy of the Exponentially Fitted Scheme on Piecewise Uniform Meshes Humans Cannot Detect AI-Generated Media But Communities May -- For Now: Collaborative AI Detection in r/RealOrAI on Reddit Deep-Research Agents Can Be Poisoned via User-Generated Content Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Discovering Lexical Gaps Using Embeddings from Multilingual LLMs TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning Fourier Feature Pyramids for Physics-Informed Neural Networks Side-by-side Comparison Amplifies Dialect Bias in Language Models Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Representation-Guided Discrete Molecular Graph Retrosynthesis Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Vision-Guided Outdoor Flight and Obstacle Evasion via Reinforcement Learning AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer ECo-MoE: Embodiment-Conditioned Mixture of Experts Increases the Evolvability of Robots CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval Learning regime-dependent governing equations: A symbolic decision tree approach Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts Attested Tool-Server Admission: A Security Extension to the Model Context Protocol Ant Backpressure Routing for Dynamic Wireless Multi-hop Networks with Mixed Traffic Patterns AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust Benchmarking Patent Embeddings: A Multi-Task Evaluation of 22 Models Across Retrieval, Classification, and Clustering Can Graph-Based Microservice Performance Detection Be Used for Microservice Intrusion Detection? ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views FDDet: Achieving Data-Efficient Food Defect Detection Under Real-World Scenarios Modernizing User Privacy Preference Measurement through GPPI: A GDPR-aligned Privacy Preference Item Bank Cross-Modal Action Recognition in Egocentric Video Using Mamba: Integrating RGB and Hand Skeleton Streams via CLS Token Fusion Strategies A Comprehensive Evaluation of Vertex Elimination Algorithms for Algorithmic Differentiation Accuracy Analysis of the Proxy Point Method with Applications to Some Toeplitz Matrices ContextEcho: A Benchmark for Persona Drift in Long Agentic-Coding Sessions Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions Decompose-and-Refine: Structured Legal Question Answering with Parametric Retrieval SEAL: Synergistic Co-Evolution of Agents and Learning Environments DRInQ: Evaluating Conversational Implicature with Controlled Context Variation ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale Private Adaptive Covariance Estimation via Gaussian Graphical Models A lift for input-convex neural network training Omissive Bias in Religious Representation: Benchmarking LLM Answers to Everyday Ethical Decision-making Refined Analysis of Entropy-Regularized Actor-Critic ChainzRule: Sample-Efficient, Robust Deep Learning Across Tabular, NLP, and Vision Tasks From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression Optimizing Digital Therapeutic Interventions: Online Learning under Endogenous Adherence An Interactive Paradigm for Deep Research A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation Assessing the Operational Viability of Foundation Models for Time Series Forecasting Batch Normalization Amplifies Memorization and Privacy Risks The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling CAffNet: Hard Constraint-Affine Neural Networks ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching Enhancing Reliability in LLM-Based Secure Code Generation SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems Asymmetric Adaptation-based Real-time Fault Diagnosis Under Transitional Operating Conditions Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content Rubato: Transcribing Piano Music with Timestamps Treatment Effect Estimation with Differentiated Networked Effect on Graph Data Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Analyzing the Effects of Two-Stage Peer Evaluation Toward Enactive Artificial Intelligence MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval Polar: Agentic RL on Any Harness at Scale Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks
TempRet: Temporal Enhancement and Two-Stage Reranking for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge
Zixu Li, Yup · 2026-05-26 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Video-text retrieval has witnessed remarkable progress driven by large-scale vision-language pretraining, yet most existing approaches inherit an implicit assumption from image-text retrieval: that visual semantics can be captured frame-by-frame. This assumption overlooks the temporal dynamics of egocentric videos. The EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge further raises the bar by providing soft-label relevance matrices rather than binary labels, demanding models that can resolve graded semantic correspondences across modalities. In this report, we present our solution, termed TempRet, to the CVPR 2026 EPIC-KITCHENS-100 MIR challenge. Our approach builds upon a CLIP-based dual-encoder backbone and introduces two key components to address the temporal and cross-modal challenges. First, a temporal transformer operates exclusively on the video side, modeling inter-frame dependencies through learnable positional encodings and multi-head self-attention over frame-level CLIP features. Second, a two-stage reranking pipeline first retrieves Top-K candidates via the dual-encoder, then refines their scores using a cross-encoder equipped with an Image-Text Matching (ITM) head. The entire system is trained with Symmetric Multi-Similarity Loss to exploit the soft-label relevance matrices provided by the challenge. Our method achieves 67.97% average mAP and 82.92% average nDCG on the EK-100 MIR benchmark, demonstrating the effectiveness of temporal modeling and cross-modal refinement for egocentric video retrieval.
Comments: Technical Report for CVPR 2026 EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.24470 [cs.CV]
  (or arXiv:2605.24470v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2605.24470

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zixu Li [view email]
[v1] Sat, 23 May 2026 08:37:39 UTC (470 KB)