惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

L
LINUX DO - 最新话题
GbyAI
GbyAI
博客园 - 叶小钗
酷 壳 – CoolShell
酷 壳 – CoolShell
IT之家
IT之家
云风的 BLOG
云风的 BLOG
Jina AI
Jina AI
I
InfoQ
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Recent Announcements
Recent Announcements
F
Full Disclosure
博客园_首页
F
Fortinet All Blogs
The Cloudflare Blog
MyScale Blog
MyScale Blog
T
The Blog of Author Tim Ferriss
月光博客
月光博客
J
Java Code Geeks
C
CERT Recently Published Vulnerability Notes
博客园 - 聂微东
aimingoo的专栏
aimingoo的专栏
L
LINUX DO - 热门话题
Attack and Defense Labs
Attack and Defense Labs
C
Comments on: Blog
量子位
B
Blog RSS Feed
Hacker News: Ask HN
Hacker News: Ask HN
O
OpenAI News
人人都是产品经理
人人都是产品经理
A
Arctic Wolf
G
Google Developers Blog
Latest news
Latest news
H
Hackread – Cybersecurity News, Data Breaches, AI and More
S
Security Affairs
B
Blog
V
V2EX
Forbes - Security
Forbes - Security
博客园 - 司徒正美
雷峰网
雷峰网
D
Docker
宝玉的分享
宝玉的分享
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Martin Fowler
Martin Fowler
Microsoft Azure Blog
Microsoft Azure Blog
A
About on SuperTechFans
MongoDB | Blog
MongoDB | Blog
W
WeLiveSecurity
www.infosecurity-magazine.com
www.infosecurity-magazine.com
The Hacker News
The Hacker News
大猫的无限游戏
大猫的无限游戏

cs.CV updates on arXiv.org

A Lightweight Multi-Metric No-Reference Image Quality Assessment Framework for UAV Imaging Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models PatchPoison: Poisoning Multi-View Datasets to Degrade 3D Reconstruction 3DRealHead: Few-Shot Detailed Head Avatar GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization Towards Patient-Specific Deformable Registration in Laparoscopic Surgery Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG) Neural 3D Reconstruction of Planetary Surfaces from Descent-Phase Wide-Angle Imagery SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview Rethinking Uncertainty in Segmentation: From Estimation to Decision Indexing Multimodal Language Models for Large-scale Image Retrieval DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision Bias at the End of the Score Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Why MLLMs Struggle to Determine Object Orientations Towards Successful Implementation of Automated Raveling Detection: Effects of Training Data Size, Illumination Difference, and Spatial Shift Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization MSGS: Multispectral 3D Gaussian Splatting Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis A Study of Failure Modes in Two-Stage Human-Object Interaction Detection FiRe: Fine-grained Multimodal Reasoning for Enhanced Image Generation ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation AI Powered Image Analysis for Phishing Detection CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing Radar-Informed 3D Multi-Object Tracking under Adverse Conditions SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning Dehaze-then-Splat: Generative Dehazing with Physics-Informed 3D Gaussian Splatting for Smoke-Free Novel View Synthesis VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation What Are We Really Measuring? Rethinking Dataset Bias in Web-Scale Natural Image Collections via Unsupervised Semantic Clustering ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data Med-CAM: Minimal Evidence for Explaining Medical Decision Making SLQ: Bridging Modalities via Shared Latent Queries for Retrieval with Frozen MLLMs Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests ReConText3D: Replay-based Continual Text-to-3D Generation ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation Temporally Consistent Long-Term Memory for 3D Single Object Tracking PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training DRG-Font: Dynamic Reference-Guided Few-shot Font Generation via Contrastive Style-Content Disentanglement Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification DiffMagicFace: Identity Consistent Facial Editing of Real Videos Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios Context Sensitivity Improves Human-Machine Visual Alignment Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model PartNerFace: Part-based Neural Radiance Fields for Animatable Facial Avatar Reconstruction ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding A Multi-Stage Optimization Pipeline for Bethesda Cell Detection in Pap Smear Cytology SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images HiProto: Hierarchical Prototype Learning for Interpretable Object Detection Under Low-quality Conditions Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework Depth-Aware Image and Video Orientation Estimation Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself OneHOI: Unifying Human-Object Interaction Generation and Editing Towards Unconstrained Human-Object Interaction Training-Free Semantic Multi-Object Tracking with Vision-Language Models UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments ROSE: Retrieval-Oriented Segmentation Enhancement
SceneConductor: 3D Scene Generation from Single Image with Multi-Agent Orchestration
[Submitted on 7 Jun 2026] · 2026-06-09 · via cs.CV updates on arXiv.org
arXiv:2606.08402v1 Announce Type: new Abstract: Generating complete 3D scenes from a single image requires in…