惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.CV updates on arXiv.org

MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space Multimodal LLMs under Pairwise Modalities Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery Goodbye Drift: Anchored Tree Sampling for Long-Horizon Video-to-Video Generation HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Findings of the Counter Turing Test: AI-Generated Image Detection FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training Holistic Reliability Propagation: Decoupling Annotation and Prediction for Robust Noisy-Label Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects? Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System AttriStory: Fine-grained Attribute Realization for Visual Storytelling with Diffusion Models EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision Early High-Frequency Injection for Geometry-Sensitive OOD Detection HADS-Net:A Hybrid Attention-Augmented Dual-Stream Network with Physics-Informed Augmentation for Breast Ultrasound Image Classification Mind Your Margin and Boundary: Are Your Distilled Datasets Truly Robust? ELEMENT: Multi-Modal Retinal Vessel Segmentation Based on a Coupled Region Growing and Machine Learning Approach ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding SDM: A Powerful Tool for Evaluating Model Robustness Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection Understanding Model Behavior in Monocular Polyp Sizing Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection Uncertainty-Guided Conservative Propagation for Structured Inference in Vessel Segmentation STELLAR: Scaling 3D Perception Large Models for Autonomous Driving VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities GAMR: Geometric-Aware Manifold Regularization with Virtual Outlier Synthesis for Learning with Noisy Labels Lighting-aware Unified Model for Instance Segmentation TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design Diffuse to Detect: Bi-Level Sample Rebalancing with Pseudo-Label Diffusion for Point-Supervised Infrared Small-Target Detection Leveraging Vision-Language Models to Detect Attention in Educational Videos ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning MeshTailor: Cutting Seams via Generative Mesh Traversal Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis Why Latent Actions Fail, and How to Prevent It Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection Direct Translation between Sign Languages AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning End-to-End Unmixing with Material Prompts for Hyperspectral Object Tracking $Δ$ynamics: Language-Based Representation for Inferring Rigid-Body Dynamics From Videos HAPS: Rethinking Image Similarity for Virtual Staining Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures Seeing Through Fog: Towards Fog-Invariant Action Recognition A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models DarkShake-DVS: Event-based Human Action Recognition under Low-light andShaking Camera Conditions ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison Neural Collapse by Design: Learning Class Prototypes on the Hypersphere Continual Segmentation under Joint Nonstationarity Winfree Oscillatory Neural Network Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding Gaze into the Details: Locality-Sensitive Enhancement for OCTA Retinal Vessel Segmentation Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition OlmoEarth v1.1: A more efficient family of OlmoEarth models GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection Rethinking Cross-Layer Information Routing in Diffusion Transformers Variance Reduction for Expectations with Diffusion Teachers Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection
Generative Data Augmentation for Skeleton Action Recognition
Xu Dong, Wan · 2026-04-17 · via cs.CV updates on arXiv.org

View PDF HTML (experimental)

Abstract:Skeleton-based human action recognition is a powerful approach for understanding human behaviour from pose data, but collecting large-scale, diverse, and well-annotated 3D skeleton datasets is both expensive and labor-intensive. To address this challenge, we propose a conditional generative pipeline for data augmentation in skeleton action recognition. Our method learns the distribution of real skeleton sequences under the constraint of action labels, enabling the synthesis of diverse and high-fidelity data. Even with limited training samples, it can effectively generate skeleton sequences and achieve competitive recognition performance in low-data scenarios, demonstrating strong generalisation in downstream tasks. Specifically, we introduce a Transformer-based encoder-decoder architecture, combined with a generative refinement module and a dropout mechanism, to balance fidelity and diversity during sampling. Experiments on HumanAct12 and the refined NTU-RGBD (NTU-VIBE) dataset show that our approach consistently improves the accuracy of multiple skeleton-based action recognition models, validating its effectiveness in both few-shot and full-data settings. The source code can be found at here.
Comments: Accepted at IEEE FG 2026
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2604.14933 [cs.CV]
  (or arXiv:2604.14933v1 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2604.14933

arXiv-issued DOI via DataCite

Submission history

From: Xu Dong [view email]
[v1] Thu, 16 Apr 2026 12:20:29 UTC (3,197 KB)