惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
D
Docker
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
WordPress大学
WordPress大学
Hugging Face - Blog
Hugging Face - Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The Cloudflare Blog
N
News and Events Feed by Topic
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
TaoSecurity Blog
TaoSecurity Blog
Y
Y Combinator Blog
小众软件
小众软件
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
E
Exploit-DB.com RSS Feed
F
Full Disclosure
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园 - Franky
宝玉的分享
宝玉的分享
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
罗磊的独立博客
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Forbes - Security
Forbes - Security
L
LangChain Blog
V
Visual Studio Blog
腾讯CDC
V
V2EX
N
News | PayPal Newsroom
博客园_首页
L
LINUX DO - 热门话题
博客园 - 【当耐特】
P
Privacy & Cybersecurity Law Blog
P
Proofpoint News Feed
Google Online Security Blog
Google Online Security Blog
S
Security Archives - TechRepublic
博客园 - 司徒正美
T
Threatpost
SecWiki News
SecWiki News
C
Cyber Attacks, Cyber Crime and Cyber Security
人人都是产品经理
人人都是产品经理
AWS News Blog
AWS News Blog
Scott Helme
Scott Helme
博客园 - 三生石上(FineUI控件)
D
Darknet – Hacking Tools, Hacker News & Cyber Security
Recorded Future
Recorded Future
V
Vulnerabilities – Threatpost
H
Hackread – Cybersecurity News, Data Breaches, AI and More
量子位

cs.AI updates on arXiv.org

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Discourse Diversity in Multi-Turn Empathic Dialogue Evaluating Cooperation in LLM Social Groups through Elected Leadership Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Efficient Training for Cross-lingual Speech Language Models Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Uncertainty-Aware Web-Conditioned Scientific Fact-Checking When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Learning and Enforcing Context-Sensitive Control for LLMs Efficient Process Reward Modeling via Contrastive Mutual Information Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning The Amazing Agent Race: Strong Tool Users, Weak Navigators Learning from Emptiness: De-biasing Listwise Rerankers with Content-Agnostic Probability Calibration Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities CircuitSynth: Reliable Synthetic Data Generation ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Demographic and Linguistic Bias Evaluation in Omnimodal Language Models Cross-Cultural Value Awareness in Large Vision-Language Models From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping Should We be Pedantic About Reasoning Errors in Machine Translation? Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards COMPOSITE-Stem GIANTS: Generative Insight Anticipation from Scientific Literature Pioneer Agent: Continual Improvement of Small Language Models in Production SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Many-Tier Instruction Hierarchy in LLM Agents Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories PhysInOne: Visual Physics Learning and Reasoning in One Suite Neural Distribution Prior for LiDAR Out-of-Distribution Detection Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection Hypergraph Neural Networks Accelerate MUS Enumeration ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems WOMBET: World Model-Based Experience Transfer for Robust and Sample-efficient Reinforcement Learning Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS) Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs SenBen: Sensitive Scene Graphs for Explainable Content Moderation eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding On Semiotic-Grounded Interpretive Evaluation of Generative Art Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation QARIMA: A Quantum Approach To Classical Time Series Analysis StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines Joint Interference Detection and Identification via Adversarial Multi-task Learning Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception How LLMs Might Think Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangr · 2025-02-11 · via cs.AI updates on arXiv.org

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. In content creation workflows, precise and simultaneous control over camera motion, object motion, and lighting direction enhances both accuracy and flexibility. However, existing approaches typically treat these control signals separately, largely due to the scarcity of datasets with high-quality joint annotations and mismatched control spaces across modalities. We present VidCRAFT3, a unified and flexible I2V framework that supports both independent and joint control over camera motion, object motion, and lighting direction by integrating three core components. Image2Cloud reconstructs a 3D point cloud from the reference image to enable precise camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale optical flow features to guide object motion. The Spatial Triple-Attention Transformer integrates lighting direction embeddings via parallel cross-attention. To address the scarcity of jointly annotated data, we curate the VideoLightingDirection (VLD) dataset of synthetic static-scene video clips with per-frame lighting-direction labels, and adopt a three-stage training strategy that enables robust learning without fully joint annotations. Extensive experiments show that VidCRAFT3 outperforms existing methods in control precision and visual coherence. Code and data will be released. Project page: https://sixiaozheng.github.io/VidCRAFT3/.