惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
Recorded Future
Recorded Future
T
Tenable Blog
S
Securelist
C
CERT Recently Published Vulnerability Notes
T
Threatpost
S
Schneier on Security
A
Arctic Wolf
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
The Register - Security
The Register - Security
Cisco Talos Blog
Cisco Talos Blog
AWS News Blog
AWS News Blog
K
Kaspersky official blog
T
True Tiger Recordings
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
P
Palo Alto Networks Blog
T
The Exploit Database - CXSecurity.com
小众软件
小众软件
B
Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Microsoft Azure Blog
Microsoft Azure Blog
Cyberwarzone
Cyberwarzone
C
Cybersecurity and Infrastructure Security Agency CISA
T
Tor Project blog
Spread Privacy
Spread Privacy
Malwarebytes
Malwarebytes
P
Proofpoint News Feed
F
Fox-IT International blog
F
Fortinet All Blogs
P
Privacy & Cybersecurity Law Blog
G
GRAHAM CLULEY
量子位
Latest news
Latest news
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 叶小钗
Project Zero
Project Zero
T
Tailwind CSS Blog
N
Netflix TechBlog - Medium
Martin Fowler
Martin Fowler
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
I
Intezer
博客园_首页
腾讯CDC
H
Hackread – Cybersecurity News, Data Breaches, AI and More
D
Darknet – Hacking Tools, Hacker News & Cyber Security

cs.AI updates on arXiv.org

Teaching Through Analogies: A Modular Pipeline for Educational Analogy Generation Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning Mimir: Large-scale Multilingual Concept Modeling STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism Beyond the Aggregation Dilemma: Prior-Retaining Decoupled Learning for Multimodal Graphs A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Complement Submodular Information Measures for Balanced and Robust Data Selection LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Rethinking Federated Unlearning via the Lens of Memorization Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation Who judges the judges? Governance from metrics: a runtime framework for continuous LLM compliance monitoring Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks Disentangled Double Machine Learning for Accurate Causal Effect Estimation BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers PILOT: Policy-Informed Learned Optimization for Adaptive Deep Network Training Fine-Tuning Over Architectural Complexity: Broad-Coverage PII Detection on PIIBench with DeBERTa Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering Assessing the Operational Viability of Foundation Models for Time Series Forecasting Generative OOD-regularized Model-based Policy Optimization Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis Towards a Universal Causal Reasoner Raon-Speech Technical Report Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts LAPLEX: The FFT of Learnable Laplace Kernels Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection A Controlled Synthetic Benchmark for Educational Aspect-Based Sentiment Analysis Quaternion Self-Attention with Shared Scores CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning Measuring the Depth of LLM Unlearning via Activation Patching RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models Feature Lottery? A Bifurcation Theory of Concept Emergence OSDTW: Optimal Shared Depth and Task Weighting for Long-Tailed Recognition Balancing Fairness, Privacy, and Accuracy: A Multitask Adversarial Framework for Centralized Data-Driven Systems Extracting Training Data from Diffusion Language Models via Infilling Bilevel Optimization of Synthetic Trajectories for Multi-Turn LLM Fine-Tuning ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale An Interactive Paradigm for Deep Research Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation Side-by-side Comparison Amplifies Dialect Bias in Language Models Momentum Streams for Optimizer-Inspired Transformers Mixture of Complementary Agents for Robust LLM Ensemble Treatment Effect Estimation with Differentiated Networked Effect on Graph Data JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode READER: Reasoning-Enhanced AI-Generated Text Detection SEP-Attack: A Simple and Effective Paradigm for Transfer-Based Textual Adversarial Attack On the Stability and Realizability of Recurrent Polynomial Surrogate Ternary Logic Gate Networks SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models Hidden-State Privacy Has an Empty Middle IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference Toward a Benchmark for Controllable Simulation of Imperfect Students with Large Language Models Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization Beyond Generative Priors: Minority Sampling with JEPA-Guided Diffusion TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling Simulating Human Memory with Language Models Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability Not All Transitions Matter: Evidence from PPO Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting PromptAudit: Auditing Prompt Sensitivity in LLM-Based Vulnerability Detection World-State Transformations for Neuro-symbolic Interactive Storytelling Investigating the Interplay between Contextual and Parametric Chain-of-Thought Faithfulness under Optimization TS-Skill: A Benchmark for Evaluating Analytical Skills in Time-Series Question Answering EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs A general tensor-structured compression scheme for efficient large language models Temporal Concept Drift in Legal Judgment Prediction: Neural Baselines Across Three Epochs of Ukrainian Court Decisions Knowledge Graph-Driven Expert-Level Reasoning for Neuroscience Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches Batch Normalization Amplifies Memorization and Privacy Risks Actionable and diverse counterfactual explanations incorporating domain knowledge and plausibility constraints A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback
CounterFlow: A Two-Phase Inference-Time Sampling for Counterfactual Video Foley Generation
Gyubin Lee, · 2026-05-20 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:We investigate Counterfactual Video Foley Generation, which aims to adopt a sound-source identity that contradicts the visual evidence while remaining temporally synchronized to a silent video. Existing Video&Text-to-Audio (VT2A) models struggle with this, often remaining anchored to the visually implied sound source when video and text contents disagree. We present ConterFlow, an inference-time dual-phase sampling scheme for pretrained flow-matching VT2A models. Phase 1 builds a video-derived temporal structure while suppressing the visually implied source; Phase 2 drops video conditioning to focus entirely on shaping audio timbre toward the target prompt. ConterFlow substantially improves counterfactual Video Foley generation compared to naive negative prompting and state-of-the-art baselines. To evaluate replacement quality, we propose a metric leveraging a text-audio co-embedding space to measure both target-prompt evidence and residual visually implied source leakage. Video demonstrations and code are available at this https URL
Comments: accepted to CVPR 2026 Workshop on Sight and Sound
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2605.18916 [cs.MM]
  (or arXiv:2605.18916v2 [cs.MM] for this version)
  https://doi.org/10.48550/arXiv.2605.18916

arXiv-issued DOI via DataCite

Submission history

From: Gyubin Lee [view email]
[v1] Mon, 18 May 2026 05:42:06 UTC (1,242 KB)
[v2] Mon, 25 May 2026 12:15:23 UTC (980 KB)