惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

MarkTechPost

Sakana AI Introduces KAME: A Tandem Speech-to-Speech Architecture That Injects LLM Knowledge in Real Time What is Tokenization Drift and How to Fix It? Mistral AI Launches Remote Agents in Vibe and Mistral Medium 3.5 with 77.6% SWE-Bench Verified Score Build a Multi-Agent AI Workflow for Biological Network Modeling, Protein Interactions, Metabolism, and Cell Signaling Simulation A Coding Implementation to Parsing, Analyzing, Visualizing, and Fine-Tuning Agent Reasoning Traces Using the lambda/hermes-agent-reasoning-traces Dataset A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Rollout Generation Speedup at 8B and Projects 2.5× End-to-End Speedup at 235B A Coding Implementation of End-to-End Brain Decoding from MEG Signals Using NeuralSet and Deep Learning for Predicting Linguistic Features Meta Introduces Autodata: An Agentic Framework That Turns AI Models into Autonomous Data Scientists for High-Quality Training Data Creation A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning Qwen AI Releases Qwen-Scope: An Open-Source Sparse AutoEncoders (SAE) Suite That Turns LLM Internal Features into Practical Development Tools A Coding Deep Dive into Agentic UI, Generative UI, State Synchronization, and Interrupt-Driven Approval Flows Moonshot AI Open-Sources FlashKDA: CUTLASS Kernels for Kimi Delta Attention with Variable-Length Batching and H20 Benchmarks Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes A Coding Implementation on Pyright Type Checking Covering Generics, Protocols, Strict Mode, Type Narrowing, and Modern Python Typing IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference Cursor Introduces a TypeScript SDK for Building Programmatic Coding Agents With Sandboxed Cloud VMs, Subagents, Hooks, and Token-Based Pricing Top 10 KV Cache Compression Techniques for LLM Inference: Reducing Memory Overhead Across Eviction, Quantization, and Low-Rank Methods Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs Step by Step Guide to Build a Complete PII Detection and Redaction Pipeline with OpenAI Privacy Filter Meta FAIR Releases NeuralSet: A Python Package for Neuro-AI That Supports fMRI, M/EEG, Spikes, and HuggingFace Embeddings smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3 A Coding Implementation on Document Parsing Benchmarking with LlamaIndex ParseBench Using Python, Hugging Face, and Evaluation Metrics Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters Top 10 Physical AI Models Powering Real-World Robots in 2026 How to Build a Lightweight Vision-Language-Action-Inspired Embodied Agent with Latent World Modeling and Model Predictive Control Meet Talkie-1930: A 13B Open-Weight LLM Trained on Pre-1931 English Text for Historical Reasoning and Generalization Research Build a Reinforcement Learning Powered Agent that Learns to Retrieve Relevant Long-Term Memories for Accurate LLM Question Answering OpenMOSS Releases MOSS-Audio: An Open-Source Foundation Model for Speech, Sound, Music, and Time-Aware Audio Reasoning Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo The LoRA Assumption That Breaks in Production How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama How to Build Smarter Multilingual Text Wrapping with BudouX Through Parsing, HTML Rendering, Model Introspection, and Toy Training Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models RAG Without Vectors: How PageIndex Retrieves by Reasoning A Coding Tutorial on Datashader on Rendering Massive Datasets with High-Performance Python Visual Analytics xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More A Coding Implementation on kvcached for Elastic KV Cache Memory, Bursty LLM Serving, and Multi-Model GPU Sharing Google DeepMind Introduces Vision Banana: An Instruction-Tuned Image Generator That Beats SAM 3 on Segmentation and Depth Anything V3 on Metric Depth Estimation Meet GitNexus: An Open-Source MCP-Native Knowledge Graph Engine That Gives Claude Code and Cursor Full Codebase Structural Awareness A Coding Implementation on Deepgram Python SDK for Transcription, Text-to-Speech, Async Audio Processing, and Text Intelligence A Coding Implementation on Microsoft’s OpenMementos with Trace Structure Analysis, Context Compression, and Fine-Tuning Data Preparation DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model Mend.io Releases AI Security Governance Framework Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval A Coding Tutorial on OpenMythos on Recurrent-Depth Transformers with Depth Extrapolation, Adaptive Computation, and Mixture-of-Experts Routing Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost How to Design a Production-Grade CAMEL Multi-Agent System with Planning, Tool Use, Self-Consistency, and Critique-Driven Refinement Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks A Detailed Implementation on Equinox with JAX Native Modules, Filtered Transforms, Stateful Layers, and End-to-End Training Workflows Next Leap to Harness Engineering: JiuwenClaw Pioneers ‘Coordination Engineering’ Photon Releases Spectrum: An Open-Source TypeScript Framework that Deploys AI Agents Directly to iMessage, WhatsApp, and Telegram OpenAI Open-Sources Euphony: A Browser-Based Visualization Tool for Harmony Chat Data and Codex Session Logs Hugging Face Releases ml-intern: An Open-Source AI Agent that Automates the LLM Post-Training Workflow A Coding Implementation to Build a Conditional Bayesian Hyperparameter Optimization Pipeline with Hyperopt, TPE, and Early Stopping Google Introduces Simula: A Reasoning-First Framework for Generating Controllable, Scalable Synthetic Datasets Across Specialized AI Domains A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps A Coding Implementation on Microsoft’s Phi-4-Mini for Quantized Inference Reasoning Tool Use RAG and LoRA Fine-Tuning OpenAI Scales Trusted Access for Cyber Defense With GPT-5.4-Cyber: a Fine-Tuned Model Built for Verified Security Defenders Moonshot AI and Tsinghua Researchers Propose PrfaaS: A Cross-Datacenter KVCache Architecture that Rethinks How LLMs are Served at Scale Meet OpenMythos: An Open-Source PyTorch Reconstruction of Claude Mythos Where 770M Parameters Match a 1.3B Transformer How TabPFN Leverages In-Context Learning to Achieve Superior Accuracy on Tabular Datasets Compared to Random Forest and CatBoost A Coding Implementation to Build an AI-Powered File Type Detection and Security Analysis Pipeline with Magika and OpenAI NVIDIA Releases Ising: the First Open Quantum AI Model Family for Hybrid Quantum-Classical Systems xAI Launches Standalone Grok Speech-to-Text and Text-to-Speech APIs, Targeting Enterprise Voice Developers A Coding Tutorial for Running PrismML Bonsai 1-Bit LLM on CUDA with GGUF, Benchmarking, Chat, JSON, and RAG A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks Google AI Releases Auto-Diagnose: An Large Language Model LLM-Based System to Diagnose Integration Test Failures at Scale A End-to-End Coding Guide to Running OpenAI GPT-OSS Open-Weight Models with Advanced Inference Workflows Top 19 AI Red Teaming Tools (2026): Secure Your ML Models A Coding Guide to Build a Production-Grade Background Task Processing System Using Huey with SQLite, Scheduling, Retries, Pipelines, and Concurrency Control Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities OpenAI Launches GPT-Rosalind: Its First Life Sciences AI Model Built to Accelerate Drug Discovery and Genomics Research Building Transformer-Based NQS for Frustrated Spin Systems with NetKet UCSD and Together AI Research Introduces Parcae: A Stable Architecture for Looped Language Models That Achieves the Quality of a Transformer Twice the Size How to Build a Universal Long-Term Memory Layer for AI Agents Using Mem0 and OpenAI A Coding Implementation to Build Multi-Agent AI Systems with SmolAgents Using Code Execution, Tool Calling, and Dynamic Orchestration A Technical Deep Dive into the Essential Stages of Modern Large Language Model Training, Alignment, and Deployment Google AI Launches Gemini 3.1 Flash TTS: A New Benchmark in Expressive and Controllable AI Voice Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI Google Launches ‘Skills’ in Chrome: Turning Reusable AI Prompts into One-Click Browser Workflows A Coding Implementation of Crawl4AI for Web Crawling, Markdown Generation, JavaScript Execution, and LLM-Based Structured Extraction TinyFish AI Releases Full Web Infrastructure Platform for AI Agents: Search, Fetch, Browser, and Agent Under One API Key NVIDIA and the University of Maryland Researchers Released Audio Flamingo Next (AF-Next): A Super Powerful and Open Large Audio-Language Model A Hands-On Coding Tutorial for Microsoft VibeVoice Covering Speaker-Aware ASR, Real-Time TTS, and Speech-to-Speech Pipelines Meta AI and KAUST Researchers Propose Neural Computers That Fold Computation, Memory, and I/O Into One Learned Model A Coding Implementation of MolmoAct for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Vision-Language Model with Bounding Box Prediction, Multilingual Support, and Sub-250ms Edge Inference Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput How to Build a Secure Local-First Agent Runtime with OpenClaw Gateway, Skills, and Controlled Tool Execution How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model
A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim
2026-04-10 · via MarkTechPost

In this tutorial, we build and run a complete Pose2Sim pipeline on Colab to understand how markerless 3D kinematics works in practice. We begin with environment setup, configure the project for Colab’s headless runtime, and then walk through calibration, 2D pose estimation, synchronization, person association, triangulation, filtering, marker augmentation, and OpenSim-based kinematics. As we progress, we not only execute each stage of the workflow but also inspect the generated outputs, visualize trajectories and joint angles, and learn how each component contributes to converting raw multi-camera videos into meaningful biomechanical motion data.

import subprocess, sys, os


def run(cmd, desc=""):
   """Run a shell command with live output."""
   print(f"\n{'='*60}")
   print(f"  {desc}" if desc else f"  Running: {cmd}")
   print(f"{'='*60}")
   result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
   if result.stdout:
       lines = result.stdout.strip().split('\n')
       for line in lines[-15:]:
           print(line)
   if result.returncode != 0:
       print(f"⚠ Warning (exit code {result.returncode}): {result.stderr[-500:] if result.stderr else 'unknown error'}")
   return result.returncode


run("pip install -q pose2sim", "Installing Pose2Sim (includes RTMPose, filtering, etc.)")


opensim_available = False
try:
   run("pip install -q opensim", "Installing OpenSim Python bindings")
   import opensim
   opensim_available = True
   print(f"✅ OpenSim {opensim.__version__} installed successfully!")
except Exception as e:
   print(f"⚠ OpenSim could not be installed ({e}).")
   print("  The tutorial will run all steps EXCEPT kinematics.")
   print("  For full kinematics, use a local conda environment instead.")


gpu_returncode = run("nvidia-smi --query-gpu=name,memory.total --format=csv,noheader", "Checking GPU availability")
if gpu_returncode != 0:
   print("⚠ No GPU detected. Pose estimation will run on CPU (slower).")
   print("  Tip: Runtime → Change runtime type → T4 GPU")
else:
   print("✅ GPU detected! Pose estimation will be accelerated.")


try:
   import torch
   print(f"✅ PyTorch {torch.__version__} | CUDA available: {torch.cuda.is_available()}")
except ImportError:
   print("ℹ PyTorch not found (pose estimation will use ONNX Runtime directly)")


print("\n🎉 Installation complete!")


import shutil
from pathlib import Path


import Pose2Sim
pkg_path = Path(Pose2Sim.__file__).parent
print(f"📦 Pose2Sim installed at: {pkg_path}")


demo_src = pkg_path / "Demo_SinglePerson"
work_dir = Path("/content/Pose2Sim_Tutorial")


if work_dir.exists():
   shutil.rmtree(work_dir)


shutil.copytree(demo_src, work_dir)
print(f"📂 Demo copied to: {work_dir}")


print("\n🗂  PROJECT STRUCTURE:")
print("=" * 60)
for p in sorted(work_dir.rglob("*")):
   depth = len(p.relative_to(work_dir).parts)
   indent = "  " * depth
   if p.is_dir():
       print(f"{indent}📁 {p.name}/")
   elif depth <= 3:
       size_kb = p.stat().st_size / 1024
       print(f"{indent}📄 {p.name} ({size_kb:.1f} KB)")


print("""
╔══════════════════════════════════════════════════════════╗
║  KEY DIRECTORIES:                                       ║
║  calibration/  → Camera calibration files               ║
║  videos/       → Raw input videos from each camera      ║
║  pose/         → 2D pose estimation results (auto-gen)  ║
║  pose-3d/      → 3D triangulated coordinates (.trc)     ║
║  kinematics/   → OpenSim joint angles (.mot)            ║
║  Config.toml   → ALL pipeline parameters                ║
╚══════════════════════════════════════════════════════════╝
""")


import toml


config_path = work_dir / "Config.toml"
config = toml.load(config_path)


print("🔧 KEY CONFIGURATION PARAMETERS:")
print("=" * 60)


sections_of_interest = ['project', 'pose', 'calibration', 'synchronization',
                       'triangulation', 'filtering', 'markerAugmentation']
for section in sections_of_interest:
   if section in config:
       print(f"\n[{section}]")
       for key, val in list(config[section].items())[:6]:
           print(f"  {key} = {val}")
       if len(config[section]) > 6:
           print(f"  ... and {len(config[section]) - 6} more parameters")


print("\n\n🔧 ADAPTING CONFIG FOR GOOGLE COLAB (headless)...")
print("-" * 60)


modifications = {}


if 'synchronization' in config:
   config['synchronization']['synchronization_gui'] = False
   modifications['synchronization.synchronization_gui'] = False


if 'pose' in config:
   config['pose']['display_detection'] = False
   modifications['pose.display_detection'] = False
   config['pose']['mode'] = 'balanced'
   modifications['pose.mode'] = 'balanced'
   config['pose']['save_video'] = 'none'
   modifications['pose.save_video'] = 'none'
   config['pose']['det_frequency'] = 50
   modifications['pose.det_frequency'] = 50


if 'filtering' in config:
   config['filtering']['display_figures'] = False
   modifications['filtering.display_figures'] = False


if 'triangulation' in config:
   config['triangulation']['undistort_points'] = False
   modifications['triangulation.undistort_points'] = False
   config['triangulation']['handle_LR_swap'] = False
   modifications['triangulation.handle_LR_swap'] = False


if 'kinematics' in config:
   config['kinematics']['use_simple_model'] = True
   modifications['kinematics.use_simple_model'] = True


with open(config_path, 'w') as f:
   toml.dump(config, f)


for k, v in modifications.items():
   print(f"  ✏ {k} → {v}")


print("\n✅ Config.toml updated for Colab execution!")

We set up the full Pose2Sim environment in Google Colab and make sure the required dependencies are available for the pipeline. We install Pose2Sim, optionally test OpenSim support, check GPU availability, and then copy the demo project into a working directory so we can operate on a clean example dataset. We also inspect and modify the configuration file to ensure the workflow runs smoothly in Colab’s headless environment, with settings that balance speed, stability, and usability.

import os
import time
import warnings
import matplotlib
matplotlib.use('Agg')


os.chdir(work_dir)
print(f"📂 Working directory: {os.getcwd()}")


from Pose2Sim import Pose2Sim


step_times = {}


print("\n" + "█" * 60)
print("█  STEP 1/8: CAMERA CALIBRATION")
print("█" * 60)
print("""
 Calibration converts camera parameters (intrinsics + extrinsics)
 into a unified Pose2Sim format.


 The demo uses 'convert' mode (pre-existing Qualisys calibration).
 For your own data, you can:
   - Convert from Qualisys, Vicon, OpenCap, AniPose, etc.
   - Calculate from scratch using a checkerboard
""")


t0 = time.time()
try:
   Pose2Sim.calibration()
   step_times['calibration'] = time.time() - t0
   print(f"\n✅ Calibration complete! ({step_times['calibration']:.1f}s)")


   calib_file = work_dir / "calibration" / "Calib.toml"
   if calib_file.exists():
       calib_data = toml.load(calib_file)
       cam_names = [k for k in calib_data.keys() if k.startswith('cam')]
       print(f"  📸 Cameras found: {len(cam_names)} → {cam_names}")
       for cam in cam_names[:2]:
           print(f"  Camera '{cam}':")
           if 'matrix' in calib_data[cam]:
               print(f"    Intrinsic matrix (3x3): ✓")
           if 'rotation' in calib_data[cam]:
               print(f"    Rotation vector: {calib_data[cam]['rotation']}")
           if 'translation' in calib_data[cam]:
               print(f"    Translation (m): {calib_data[cam]['translation']}")
except Exception as e:
   print(f"⚠ Calibration warning: {e}")
   step_times['calibration'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 2/8: 2D POSE ESTIMATION (RTMPose)")
print("█" * 60)
print("""
 RTMPose detects 2D body keypoints in each video frame.
 - Model: RTMPose (balanced mode) with YOLOX detection
 - Body model: Body_with_feet (HALPE_26 keypoints)
 - Output: JSON files per frame, per camera
 - GPU acceleration used if available


 For custom models (animal, hand, face), set `mode` in Config.toml
 to a dictionary with custom model URLs (see README).
""")


t0 = time.time()
try:
   Pose2Sim.poseEstimation()
   step_times['poseEstimation'] = time.time() - t0
   print(f"\n✅ Pose estimation complete! ({step_times['poseEstimation']:.1f}s)")


   pose_dirs = list((work_dir / "pose").glob("*_json"))
   for pd in pose_dirs:
       n_files = len(list(pd.glob("*.json")))
       print(f"  📁 {pd.name}: {n_files} JSON frames generated")
except Exception as e:
   print(f"⚠ Pose estimation error: {e}")
   step_times['poseEstimation'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 3/8: CAMERA SYNCHRONIZATION")
print("█" * 60)
print("""
 Synchronization aligns frames across cameras by correlating
 vertical keypoint speeds. This finds the time offset between cameras.


 Skip this step if your cameras are hardware-synchronized
 (e.g., via timecode, trigger cable, or GoPro GPS sync).
""")


t0 = time.time()
try:
   Pose2Sim.synchronization()
   step_times['synchronization'] = time.time() - t0
   print(f"\n✅ Synchronization complete! ({step_times['synchronization']:.1f}s)")
except Exception as e:
   print(f"⚠ Synchronization note: {e}")
   step_times['synchronization'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 4/8: PERSON ASSOCIATION")
print("█" * 60)
print("""
 Associates detected persons across different camera views.
 - Single-person mode: picks person with lowest reprojection error
 - Multi-person mode: uses epipolar geometry to match people


 For this single-person demo, it selects the best-matching person.
""")


t0 = time.time()
try:
   Pose2Sim.personAssociation()
   step_times['personAssociation'] = time.time() - t0
   print(f"\n✅ Person association complete! ({step_times['personAssociation']:.1f}s)")
except Exception as e:
   print(f"⚠ Person association note: {e}")
   step_times['personAssociation'] = time.time() - t0

We begin running the core Pose2Sim pipeline step by step to clearly understand what happens at each stage. We start with camera calibration, then perform 2D pose estimation with RTMPose, and continue into camera synchronization and person association to prepare the data for 3D reconstruction. As we execute these stages, we also inspect the outputs and timing information to connect the computation to the files and structures being generated.

print("\n" + "█" * 60)
print("█  STEP 5/8: 3D TRIANGULATION")
print("█" * 60)
print("""
 Triangulates 2D keypoints from multiple cameras into 3D coordinates.
 - Weighted by detection confidence scores
 - Cameras removed if reprojection error exceeds threshold
 - Right/left swap correction (disabled per current recommendation)
 - Missing values interpolated (cubic, bezier, linear, etc.)
 - Output: .trc file (OpenSim-compatible 3D marker format)
""")


t0 = time.time()
try:
   Pose2Sim.triangulation()
   step_times['triangulation'] = time.time() - t0
   print(f"\n✅ Triangulation complete! ({step_times['triangulation']:.1f}s)")


   trc_files = list((work_dir / "pose-3d").glob("*.trc"))
   for trc in trc_files:
       size_kb = trc.stat().st_size / 1024
       print(f"  📄 {trc.name} ({size_kb:.1f} KB)")
except Exception as e:
   print(f"⚠ Triangulation error: {e}")
   step_times['triangulation'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 6/8: 3D COORDINATE FILTERING")
print("█" * 60)
print("""
 Smooths 3D trajectories to reduce noise from triangulation.
 Available filters:
   - Butterworth (low-pass, default)
   - Kalman (predictive smoothing)
   - Butterworth on speed
   - Gaussian, LOESS, Median


 Filter type and parameters are set in Config.toml under [filtering].
""")


t0 = time.time()
try:
   Pose2Sim.filtering()
   step_times['filtering'] = time.time() - t0
   print(f"\n✅ Filtering complete! ({step_times['filtering']:.1f}s)")
except Exception as e:
   print(f"⚠ Filtering note: {e}")
   step_times['filtering'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 7/8: MARKER AUGMENTATION")
print("█" * 60)
print("""
 Uses a Stanford-trained LSTM to predict 47 virtual markers from
 the triangulated keypoints. These additional markers can improve
 OpenSim model scaling and inverse kinematics.


 NOTE: Results are NOT always better with augmentation (~50/50).
 The pipeline lets you compare with and without augmentation.
""")


t0 = time.time()
try:
   Pose2Sim.markerAugmentation()
   step_times['markerAugmentation'] = time.time() - t0
   print(f"\n✅ Marker augmentation complete! ({step_times['markerAugmentation']:.1f}s)")


   aug_files = list((work_dir / "pose-3d").glob("*augmented*.trc"))
   for af in aug_files:
       print(f"  📄 {af.name} ({af.stat().st_size/1024:.1f} KB)")
except Exception as e:
   print(f"⚠ Marker augmentation note: {e}")
   step_times['markerAugmentation'] = time.time() - t0


print("\n" + "█" * 60)
print("█  STEP 8/8: OPENSIM KINEMATICS")
print("█" * 60)
print("""
 Performs automatic model scaling and inverse kinematics:
   1. Scale: Adjusts a generic OpenSim model to participant dimensions
      (based on segment lengths from the slowest, most stable frames)
   2. IK: Computes 3D joint angles by fitting the model to markers
  Output:
   - Scaled .osim model file
   - Joint angles .mot file (can be loaded in OpenSim GUI)
""")


t0 = time.time()
if opensim_available:
   try:
       Pose2Sim.kinematics()
       step_times['kinematics'] = time.time() - t0
       print(f"\n✅ Kinematics complete! ({step_times['kinematics']:.1f}s)")


       kin_dir = work_dir / "kinematics"
       if kin_dir.exists():
           for f in sorted(kin_dir.glob("*")):
               print(f"  📄 {f.name} ({f.stat().st_size/1024:.1f} KB)")
   except Exception as e:
       print(f"⚠ Kinematics error: {e}")
       step_times['kinematics'] = time.time() - t0
else:
   print("  ⏭ Skipped — OpenSim not installed on this Colab runtime.")
   print("  To run kinematics locally, install via:")
   print("    conda install -c opensim-org opensim")
   step_times['kinematics'] = 0


print("\n" + "═" * 60)
print("  ⏱  PIPELINE TIMING SUMMARY")
print("═" * 60)
total = 0
for step, t in step_times.items():
   bar = "█" * int(t / max(step_times.values(), default=1) * 30)
   print(f"  {step:<22s} {t:6.1f}s  {bar}")
   total += t
print(f"  {'TOTAL':<22s} {total:6.1f}s")
print("═" * 60)

We continue the main workflow by converting the associated 2D detections into 3D coordinates via triangulation, then improving the resulting trajectories via filtering. We also run marker augmentation to enrich the marker set for downstream biomechanical analysis and then attempt OpenSim-based kinematics when the environment supports it. By the end of this section, we will summarize the runtime of each major step to evaluate the overall efficiency and flow of the full pipeline.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from pathlib import Path
import re


def parse_trc(trc_path):
   """Parse a .trc file and return marker names, frame data, and metadata."""
   with open(trc_path, 'r') as f:
       lines = f.readlines()


   meta_keys = lines[2].strip().split('\t')
   meta_vals = lines[3].strip().split('\t')
   metadata = dict(zip(meta_keys, meta_vals))


   marker_line = lines[3].strip().split('\t')


   header_lines = 0
   for i, line in enumerate(lines):
       if line.strip() and not line.startswith(('PathFileType', 'DataRate',
                                                 'Frame', '\t')):
           try:
               float(line.strip().split('\t')[0])
               header_lines = i
               break
           except ValueError:
               continue


   raw_markers = lines[3].strip().split('\t')
   markers = [m for m in raw_markers if m and m not in ('Frame#', 'Time')]


   marker_names = []
   for m in markers:
       if m and (not marker_names or m != marker_names[-1]):
           marker_names.append(m)


   data_lines = lines[header_lines:]
   data = []
   for line in data_lines:
       vals = line.strip().split('\t')
       if len(vals) > 2:
           try:
               row = [float(v) if v else np.nan for v in vals]
               data.append(row)
           except ValueError:
               continue


   data = np.array(data)
   return marker_names, data, metadata




trc_files = sorted((work_dir / "pose-3d").glob("*.trc"))
print(f"📊 Found {len(trc_files)} TRC file(s):")
for f in trc_files:
   print(f"   {f.name}")


trc_file = None
for f in trc_files:
   if 'filt' in f.name.lower() and 'augm' not in f.name.lower():
       trc_file = f
       break
if trc_file is None and trc_files:
   trc_file = trc_files[0]


if trc_file:
   print(f"\n📈 Visualizing: {trc_file.name}")
   marker_names, data, metadata = parse_trc(trc_file)
   print(f"   Markers: {len(marker_names)}")
   print(f"   Frames:  {data.shape[0]}")
   print(f"   Marker names: {marker_names[:10]}{'...' if len(marker_names) > 10 else ''}")


   frames = data[:, 0].astype(int) if data.shape[1] > 0 else []
   times = data[:, 1] if data.shape[1] > 1 else []
   coords = data[:, 2:]


   n_markers = len(marker_names)


   mid_frame = len(data) // 2
   fig = plt.figure(figsize=(16, 6))


   ax1 = fig.add_subplot(131, projection='3d')
   xs = coords[mid_frame, 0::3][:n_markers]
   ys = coords[mid_frame, 1::3][:n_markers]
   zs = coords[mid_frame, 2::3][:n_markers]


   ax1.scatter(xs, ys, zs, c='dodgerblue', s=40, alpha=0.8, edgecolors='navy')
   for i, name in enumerate(marker_names[:len(xs)]):
       if i % 3 == 0:
           ax1.text(xs[i], ys[i], zs[i], f' {name}', fontsize=6, alpha=0.7)


   ax1.set_xlabel('X (m)')
   ax1.set_ylabel('Y (m)')
   ax1.set_zlabel('Z (m)')
   ax1.set_title(f'3D Keypoints (Frame {int(frames[mid_frame])})', fontsize=10)


   ax2 = fig.add_subplot(132)
   key_markers = ['RAnkle', 'LAnkle', 'RWrist', 'LWrist', 'Nose']
   colors_map = {'RAnkle': 'red', 'LAnkle': 'blue', 'RWrist': 'orange',
                 'LWrist': 'green', 'Nose': 'purple'}


   for mkr in key_markers:
       if mkr in marker_names:
           idx = marker_names.index(mkr)
           z_col = idx * 3 + 2
           if z_col < coords.shape[1]:
               ax2.plot(times, coords[:, z_col],
                        label=mkr, color=colors_map.get(mkr, 'gray'),
                        linewidth=1.2, alpha=0.8)


   ax2.set_xlabel('Time (s)')
   ax2.set_ylabel('Z position (m)')
   ax2.set_title('Vertical Trajectories', fontsize=10)
   ax2.legend(fontsize=8, loc='best')
   ax2.grid(True, alpha=0.3)


   ax3 = fig.add_subplot(133)
   if len(times) > 1:
       dt = np.diff(times)
       dt[dt == 0] = 1e-6
       for mkr in ['RAnkle', 'RWrist']:
           if mkr in marker_names:
               idx = marker_names.index(mkr)
               x_col, y_col, z_col = idx * 3, idx * 3 + 1, idx * 3 + 2
               if z_col < coords.shape[1]:
                   dx = np.diff(coords[:, x_col])
                   dy = np.diff(coords[:, y_col])
                   dz = np.diff(coords[:, z_col])
                   speed = np.sqrt(dx**2 + dy**2 + dz**2) / dt
                   speed = np.clip(speed, 0, np.nanpercentile(speed, 99))
                   ax3.plot(times[1:], speed, label=mkr,
                            color=colors_map.get(mkr, 'gray'),
                            linewidth=0.8, alpha=0.7)


       ax3.set_xlabel('Time (s)')
       ax3.set_ylabel('Speed (m/s)')
       ax3.set_title('Marker Speeds (Quality Check)', fontsize=10)
       ax3.legend(fontsize=8)
       ax3.grid(True, alpha=0.3)


   plt.tight_layout()
   plt.savefig(work_dir / 'trajectory_analysis.png', dpi=150, bbox_inches='tight')
   plt.show()
   print("✅ Trajectory plots saved to trajectory_analysis.png")


else:
   print("⚠ No TRC file found to visualize.")

We focus on interpreting the generated 3D motion data by parsing the TRC file and extracting marker trajectories in a structured way. We create visualizations of 3D keypoints, the vertical motion of selected body parts, and marker speeds, so we can inspect both movement patterns and signal quality. This helps us move beyond simply running the pipeline and start understanding what the reconstructed motion actually looks like over time.

def parse_mot(mot_path):
   """Parse an OpenSim .mot file and return column names and data."""
   with open(mot_path, 'r') as f:
       lines = f.readlines()


   data_start = 0
   for i, line in enumerate(lines):
       if 'endheader' in line.lower():
           data_start = i + 1
           break


   col_line = lines[data_start].strip().split('\t')
   if not col_line[0].replace('.', '').replace('-', '').replace('e', '').isdigit():
       col_names = col_line
       data_start += 1
   else:
       col_names = lines[data_start - 2].strip().split('\t')


   data = []
   for line in lines[data_start:]:
       vals = line.strip().split('\t')
       try:
           row = [float(v) for v in vals]
           data.append(row)
       except ValueError:
           continue


   return col_names, np.array(data)




mot_files = sorted((work_dir / "kinematics").glob("*.mot")) if (work_dir / "kinematics").exists() else []


if mot_files:
   mot_file = mot_files[0]
   print(f"📈 Visualizing joint angles: {mot_file.name}")


   col_names, jdata = parse_mot(mot_file)
   print(f"   Columns: {len(col_names)}")
   print(f"   Frames:  {jdata.shape[0]}")
   print(f"   Joints:  {col_names[:8]}...")


   fig, axes = plt.subplots(2, 3, figsize=(16, 8))
   fig.suptitle('OpenSim Joint Angles from Pose2Sim', fontsize=14, fontweight='bold')


   joints_to_plot = [
       ('hip_flexion_r', 'Right Hip Flexion'),
       ('hip_flexion_l', 'Left Hip Flexion'),
       ('knee_angle_r', 'Right Knee Angle'),
       ('knee_angle_l', 'Left Knee Angle'),
       ('ankle_angle_r', 'Right Ankle Angle'),
       ('ankle_angle_l', 'Left Ankle Angle'),
   ]


   time_col = jdata[:, 0] if 'time' in col_names[0].lower() else np.arange(jdata.shape[0])


   for ax, (joint_key, title) in zip(axes.flat, joints_to_plot):
       col_idx = None
       for i, cn in enumerate(col_names):
           if joint_key.lower() in cn.lower():
               col_idx = i
               break


       if col_idx is not None and col_idx < jdata.shape[1]:
           ax.plot(time_col, jdata[:, col_idx], 'b-', linewidth=1.2)
           ax.set_title(title, fontsize=10)
           ax.set_xlabel('Time (s)')
           ax.set_ylabel('Angle (°)')
           ax.grid(True, alpha=0.3)
       else:
           ax.text(0.5, 0.5, f'{title}\n(not found)', ha='center',
                   va='center', transform=ax.transAxes, fontsize=10, color='gray')
           ax.set_title(title, fontsize=10, color='gray')


   plt.tight_layout()
   plt.savefig(work_dir / 'joint_angles.png', dpi=150, bbox_inches='tight')
   plt.show()
   print("✅ Joint angle plots saved to joint_angles.png")


else:
   print("ℹ No .mot files found (kinematics step was skipped or OpenSim unavailable).")
   print("  Joint angle visualization will be available when running locally with OpenSim.")


print("📊 DATA QUALITY ANALYSIS")
print("=" * 60)


if trc_file:
   marker_names, data, metadata = parse_trc(trc_file)
   coords = data[:, 2:]


   print("\n🔍 Per-Marker Data Completeness:")
   print("-" * 45)
   n_frames = data.shape[0]
   for i, mkr in enumerate(marker_names):
       x_col = i * 3
       if x_col < coords.shape[1]:
           valid = np.sum(~np.isnan(coords[:, x_col]))
           pct = 100 * valid / n_frames
           bar = "█" * int(pct / 5) + "░" * (20 - int(pct / 5))
           status = "✅" if pct > 90 else "⚠" if pct > 50 else "❌"
           print(f"  {status} {mkr:<20s} {bar} {pct:5.1f}% ({valid}/{n_frames})")


   print("\n🔍 Trajectory Smoothness (lower = better):")
   print("-" * 45)
   for mkr in marker_names[:10]:
       idx = marker_names.index(mkr)
       cols = [idx * 3, idx * 3 + 1, idx * 3 + 2]
       if cols[2] < coords.shape[1]:
           xyz = coords[:, cols]
           accel = np.diff(xyz, axis=0, n=2)
           jitter = np.nanmean(np.abs(accel))
           bar = "█" * min(int(jitter * 500), 20)
           print(f"  {mkr:<20s} jitter={jitter:.4f}  {bar}")

We further analyze the kinematic results by parsing the MOT file and plotting key joint angles after the OpenSim stage completes successfully. We then evaluate overall data quality by measuring marker completeness and estimating smoothness using inter-frame jitter, which provides a better sense of how stable and reliable the reconstructed trajectories are. This section helps us assess whether the motion output is ready for meaningful biomechanical interpretation or if additional tuning is needed.

print("\n🔧 ADVANCED: PROGRAMMATIC CONFIGURATION")
print("=" * 60)
print("""
You can modify ANY Config.toml parameter programmatically.
This is useful for batch experiments or parameter sweeps.
Below are common modifications for different scenarios.
""")


import toml


config = toml.load(work_dir / "Config.toml")


print("Example 1: Change filter type to Kalman")
example_config_1 = {
   'filtering': {
       'type': 'kalman',
   }
}
print(f"  config['filtering']['type'] = 'kalman'")


print("\nExample 2: Tighten triangulation quality thresholds")
example_config_2 = {
   'triangulation': {
       'likelihood_threshold': 0.4,
       'reproj_error_threshold_triangulation': 15,
       'min_cameras_for_triangulation': 3,
   }
}
for k, v in example_config_2['triangulation'].items():
   print(f"  config['triangulation']['{k}'] = {v}")


print("\nExample 3: Use lightweight mode for faster processing")
print("  config['pose']['mode'] = 'lightweight'")
print("  config['pose']['mode'] = 'balanced', 'performance', or custom dict")


print("\nExample 4: Enable multi-person analysis")
print("  config['project']['multi_person'] = True")
print("  config['project']['participant_height'] = [1.72, 1.65]")
print("  config['project']['participant_mass'] = [70.0, 55.0]")


print("\nExample 5: Process only specific frames")
print("  config['project']['frame_range'] = [50, 200]")


print("\n💡 To apply changes, modify and save Config.toml, then re-run the pipeline:")
print("""
   import toml
   config = toml.load('Config.toml')
   config['filtering']['type'] = 'kalman'
   with open('Config.toml', 'w') as f:
       toml.dump(config, f)
  
   from Pose2Sim import Pose2Sim
   Pose2Sim.filtering()
""")


print("\n🚀 ALTERNATIVE: ONE-LINE PIPELINE EXECUTION")
print("=" * 60)
print("""
Instead of calling each step individually, use:


   from Pose2Sim import Pose2Sim
   Pose2Sim.runAll()


Or with selective steps:


   Pose2Sim.runAll(
       do_calibration=True,
       do_poseEstimation=True,
       do_synchronization=True,
       do_personAssociation=True,
       do_triangulation=True,
       do_filtering=True,
       do_markerAugmentation=False,
       do_kinematics=True
   )


This is equivalent to calling each step in sequence, but more concise
for production workflows.
""")


print("\n📦 ALL GENERATED OUTPUT FILES")
print("=" * 60)


output_dirs = ['pose', 'pose-3d', 'kinematics']
for dirname in output_dirs:
   d = work_dir / dirname
   if d.exists():
       files = list(d.rglob("*"))
       files = [f for f in files if f.is_file()]
       total_size = sum(f.stat().st_size for f in files) / (1024 * 1024)
       print(f"\n📁 {dirname}/ ({len(files)} files, {total_size:.1f} MB total)")
       for f in sorted(files)[:15]:
           rel = f.relative_to(d)
           size = f.stat().st_size / 1024
           print(f"   {'📄'} {rel} ({size:.1f} KB)")
       if len(files) > 15:
           print(f"   ... and {len(files) - 15} more files")


print("\n" + "=" * 60)
print("📥 TO DOWNLOAD RESULTS IN GOOGLE COLAB:")
print("=" * 60)
print("""
 from google.colab import files
 files.download('/content/Pose2Sim_Tutorial/pose-3d/YOUR_FILE.trc')


 !zip -r /content/pose2sim_results.zip /content/Pose2Sim_Tutorial/pose-3d /content/Pose2Sim_Tutorial/kinematics
 files.download('/content/pose2sim_results.zip')
""")


print("""
╔══════════════════════════════════════════════════════════════════════════════╗
║                         TUTORIAL COMPLETE! 🎉                              ║
╠══════════════════════════════════════════════════════════════════════════════╣
║                                                                            ║
║  WHAT YOU ACHIEVED:                                                        ║
║  ✅ Installed Pose2Sim on Google Colab                                     ║
║  ✅ Ran camera calibration (conversion mode)                               ║
║  ✅ Performed 2D pose estimation with RTMPose                              ║
║  ✅ Synchronized multi-camera views                                        ║
║  ✅ Associated persons across cameras                                      ║
║  ✅ Triangulated 2D→3D keypoints                                           ║
║  ✅ Filtered 3D trajectories                                               ║
║  ✅ Augmented markers with Stanford LSTM                                   ║
║  ✅ Computed joint angles with OpenSim IK (if available)                   ║
║  ✅ Visualized trajectories, joint angles, and data quality                ║
║                                                                            ║
║  NEXT STEPS:                                                               ║
║  • Try with YOUR OWN multi-camera videos                                   ║
║  • Experiment with 'performance' mode for higher accuracy                  ║
║  • Try multi-person mode (Demo_MultiPerson)                                ║
║  • Visualize in OpenSim GUI or Blender (Pose2Sim_Blender add-on)          ║
║  • Use custom DeepLabCut models for animal tracking                        ║
║  • Run batch processing over multiple trials                               ║
║                                                                            ║
║  CITATION:                                                                 ║
║  Pagnon, D., Domalain, M., & Reveret, L. (2022).                         ║
║                                                                            ║
║  GitHub:  https://github.com/perfanalytics/pose2sim                        ║
║  Discord: https://discord.com/invite/4mXUdSFjmt                           ║
║  Docs:    https://perfanalytics.github.io/pose2sim/                        ║
╚══════════════════════════════════════════════════════════════════════════════╝
""")

We explore more advanced usage patterns by demonstrating how to modify the Pose2Sim configuration programmatically for experiments and alternative processing setups. We also review how to run the entire pipeline in one call, inspect all generated output files, and prepare the results for download directly from Colab. Finally, we wrap up the tutorial with a summary of what we accomplished and highlight the next steps to extend the workflow to our own datasets and research goals.

In conclusion, we gained a full hands-on understanding of how Pose2Sim transforms multi-view videos into 3D motion trajectories and biomechanical outputs within a practical Colab workflow. We saw how each stage of the pipeline connects to the next, from extracting reliable 2D keypoints to reconstructing filtered 3D coordinates and, when available, estimating joint kinematics through OpenSim. We also went beyond basic execution by visualizing results, checking data quality, and exploring programmatic configuration changes for more advanced experimentation. At the end, we have a reusable, adaptable pipeline that we can extend to our own datasets, refine for greater accuracy, and use as a foundation for deeper motion analysis and markerless biomechanics research.


Check out the FULL CODES hereAlso, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.