惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Security Blog
Microsoft Security Blog
Forbes - Security
Forbes - Security
月光博客
月光博客
WordPress大学
WordPress大学
Last Week in AI
Last Week in AI
罗磊的独立博客
V
Visual Studio Blog
Help Net Security
Help Net Security
宝玉的分享
宝玉的分享
H
Heimdal Security Blog
The Last Watchdog
The Last Watchdog
V
V2EX - 技术
S
SegmentFault 最新的问题
爱范儿
爱范儿
C
Check Point Blog
GbyAI
GbyAI
L
LINUX DO - 最新话题
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Martin Fowler
Martin Fowler
Google Online Security Blog
Google Online Security Blog
F
Fortinet All Blogs
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Google DeepMind News
Google DeepMind News
aimingoo的专栏
aimingoo的专栏
H
Hacker News: Front Page
M
MIT News - Artificial intelligence
T
Threatpost
IT之家
IT之家
AI
AI
P
Privacy & Cybersecurity Law Blog
H
Hackread – Cybersecurity News, Data Breaches, AI and More
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Stack Overflow Blog
Stack Overflow Blog
博客园 - 叶小钗
云风的 BLOG
云风的 BLOG
The Hacker News
The Hacker News
N
News and Events Feed by Topic
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
大猫的无限游戏
大猫的无限游戏
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Security Archives - TechRepublic
T
The Blog of Author Tim Ferriss
Cloudbric
Cloudbric
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
V
V2EX
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知

cs.LG updates on arXiv.org

Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning Bias-Corrected Adaptive Conformal Inference for Multi-Horizon Time Series Forecasting Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation Enhancing Confidence Estimation in Telco LLMs via Twin-Pass CoT-Ensembling MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models Physics-informed reservoir characterization from bulk and extreme pressure events with a differentiable simulator Some Theoretical Limitations of t-SNE Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics Linear Probe Accuracy Scales with Model Size and Benefits from Multi-Layer Ensembling Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion Computational framework for multistep metabolic pathway design LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification Parameter-efficient Quantum Multi-task Learning Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Self-Organizing Maps with Optimized Latent Positions A Bayesian Framework for Uncertainty-Aware Explanations in Power Quality Disturbance Classification Optimization with SpotOptim Physics-Informed Neural Networks for Solving Derivative-Constrained PDEs Spectral Thompson sampling Online learning with noisy side observations Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate Composite Silhouette: A Subsampling-based Aggregation Strategy RPS: Information Elicitation with Reinforcement Prompt Selection UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization Beyond State Consistency: Behavior Consistency in Text-Based World Models Simulation-Based Optimisation of Batting Order and Bowling Plans in T20 Cricket Hardware-Efficient Neuro-Symbolic Networks with the Exp-Minus-Log Operator Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety MolCryst-MLIPs: A Machine-Learned Interatomic Potentials Database for Molecular Crystals DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Unsupervised Anomaly Detection in Process-Complex Industrial Time Series: A Real-World Case Study Quantum Machine Learning for Colorectal Cancer Data: Anastomotic Leak Classification and Risk Factors Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation BOAT: Navigating the Sea of In Silico Predictors for Antibody Design via Multi-Objective Bayesian Optimization PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling Unsupervised domain transfer: Overcoming signal degradation in sleep monitoring by increasing scoring realism Physics-Informed Neural Networks for Methane Sorption: Cross-Gas Transfer Learning, Ensemble Collapse Under Physics Constraints, and Monte Carlo Dropout Uncertainty Quantification A Complete Symmetry Classification of Shallow ReLU Networks Momentum Further Constrains Sharpness at the Edge of Stochastic Stability Complex Interpolation of Matrices with an application to Multi-Manifold Learning Learning Probabilistic Responsibility Allocations for Multi-Agent Interactions Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version HUANet: Hard-Constrained Unrolled ADMM for Constrained Convex Optimization Fast Voxelization and Level of Detail for Microgeometry Rendering Rare Event Analysis via Stochastic Optimal Control From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning TIP: Token Importance in On-Policy Distillation Neural architectures for resolving references in program code $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data First-See-Then-Design: A Multi-Stakeholder View for Optimal Performance-Fairness Trade-Offs MAny: Merge Anything for Multimodal Continual Instruction Tuning Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection Context Sensitivity Improves Human-Machine Visual Alignment Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram Ordinary Least Squares is a Special Case of Transformer (How) Learning Rates Regulate Catastrophic Overtraining Golden Handcuffs make safer AI agents Design Space Exploration of Hybrid Quantum Neural Networks for Chronic Kidney Disease C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning Monthly Diffusion v0.9: A Latent Diffusion Model for the First AI-MIP Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus From Order to Distribution: A Spectral Characterization of Forgetting in Continual Learning Asymmetric-Loss-Guided Hybrid CNN-BiLSTM-Attention Model for Industrial RUL Prediction with Interpretable Failure Heatmaps MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection Outperforming Self-Attention Mechanisms in Solar Irradiance Forecasting via Physics-Guided Neural Networks A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence Beyond Uniform Sampling: Synergistic Active Learning and Input Denoising for Robust Neural Operators The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform Deep Spatially-Regularized and Superpixel-Based Diffusion Learning for Unsupervised Hyperspectral Image Clustering DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy · 2024-09-16 · via cs.LG updates on arXiv.org

The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.