惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
S
Securelist
Project Zero
Project Zero
L
LINUX DO - 热门话题
T
Tenable Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
M
MIT News - Artificial intelligence
The Register - Security
The Register - Security
C
Cyber Attacks, Cyber Crime and Cyber Security
Simon Willison's Weblog
Simon Willison's Weblog
T
The Exploit Database - CXSecurity.com
NISL@THU
NISL@THU
T
Tor Project blog
I
InfoQ
WordPress大学
WordPress大学
阮一峰的网络日志
阮一峰的网络日志
罗磊的独立博客
Know Your Adversary
Know Your Adversary
T
The Blog of Author Tim Ferriss
S
SegmentFault 最新的问题
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
小众软件
小众软件
The GitHub Blog
The GitHub Blog
C
CERT Recently Published Vulnerability Notes
博客园 - 三生石上(FineUI控件)
J
Java Code Geeks
A
About on SuperTechFans
宝玉的分享
宝玉的分享
W
WeLiveSecurity
SecWiki News
SecWiki News
Hugging Face - Blog
Hugging Face - Blog
Blog — PlanetScale
Blog — PlanetScale
The Hacker News
The Hacker News
V2EX - 技术
V2EX - 技术
Cyberwarzone
Cyberwarzone
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Palo Alto Networks Blog
S
Schneier on Security
I
Intezer
P
Proofpoint News Feed
C
Check Point Blog
博客园 - 聂微东
B
Blog RSS Feed
Google DeepMind News
Google DeepMind News
大猫的无限游戏
大猫的无限游戏
C
CXSECURITY Database RSS Feed - CXSecurity.com
人人都是产品经理
人人都是产品经理
博客园 - 叶小钗
G
GRAHAM CLULEY

stat updates on arXiv.org

Transfer Learning using 66 Diseases for Disease Forecasting Applications Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run Causal Risk Minimization for High-Dimensional Treatments Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis The Role of Causal Features in Strategic Classification for Robustness and Alignment Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias Gaussian Process-based learning with new MCMC-based implementation of Wishart prior on correlation matrix Causal Representation Learning for Generalisable Recommendation Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination Sampling Data with Chains of Forward-Backward Diffusion Steps Constrained Bayesian Experimental Design via Online Planning Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks Agile Online Model Selection: Resolving Adaptation Lag via Safeguarded Large Learning Rates Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Transformers Can Learn Posterior Predictive Distributions In-Context Proper Calibeating Model Merging on Loss Landscape: A Geometry Perspective CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk Bilevel Optimization over Saddle Points of Zero-Sum Markov Games More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations Sample Complexity of Policy Gradient for Log-Growth Control Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review Minimax Limits of k-Fold Cross-Validation via Majority Small Ensemble-based Data Assimilation: A Machine Learning-Enhanced Data Assimilation Method with Limited Ensemble Size Possession-Level Player Impact in the Pre-Play-by-Play NBA Era: A Video-Reconstructed RAPM Database, 1984--1996 PCA score regression: the art of losing power Heritability: A Counterfactual Perspective Long Memory in Intrinsically Dynamic Factor Models Modified treatment policies that depend on the natural history of treatment Post-Processing Posterior Predictive P-values Scalable Gaussian Process for Learning Non-Ergodic Ground Motion Model from Physics-Based Simulations with Application to Power Infrastructure Assessment Using the target trial framework for combining information: external comparator analyses and other applications Trustworthy AI/ML Regression and Unbiased Causal Inference for Real-World Data Synthetic Heterogeneous-Effects LASSO: A Fixed-effects Estimation Approach for High-dimensional Mixed-effects Models Bayesian Conformal-Projective Prediction Shared hidden-factor information framework for multiple behavioral tasks Consistent Identification of Top-$K$ Nodes in Noisy Networks Adaptable High-Dimensional Change Point Detection via Ridge Regularization Logistic regression is not enough: The need for Bayesian nonparametric modelling for causal inference using observational data, exemplified by the 'gateway' effect Distributional Conformal Prediction for Markov Processes How Eviction Court Governs: A Statistical Analysis of Bargaining, Templates, and Debt in Philadelphia Deep Regression for Repeated Measurements under Covariate Shift Optimal Estimation of Discrete Multiview Distributions under Heteroskedastic Multinomial Sampling Information-Theoretic Reliability is Robust to Analytic Choice: A 24-Specification Multiverse on Public Cognitive Test-Retest Data Kernel Embedding for Operator-Valued Measures and Its Application to Quantum Tomography A Statistical Physics View of the S&P 500: Pairwise Interactions and Time-Varying Dynamics A Quasi Maximum Likelihood Estimation Method for Bergomi-Type Volatility Models Rank-Based Tests for Mutual Independence of High-Dimensional Random Vectors via $L_q$ Norm Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests Estimation of Directed Acyclic Graphs by Frequentist Model Averaging Exponential mixing properties of nonlinear functional autoregressive models Confidence intervals for causal effects in sequential decision making Measuring multivariate maximal tail dependence A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores Bayesian perspectives on exponential random graph models Nonparametric Estimation via Expected Order Statistics Weighted NPMLE for the Marginal Mean of Recurrent Events with a Competing Terminal Event Considering causality in the construction of molecular signatures of lifestyle exposures Quantile autoregressive moving average models for ratio-based bounded time series Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles Match classification in the last round of four-team round-robin tournaments A multilevel sketch-and-solve method for overdetermined least squares problems The Symmetric Location Problem: a Song of Efficiency and Robustness Statistical methods for partitioning ribbon and globally-distributed flux using data from the Interstellar Boundary Explorer Selective Randomization Inference for Adaptive Experiments Weight-calibrated estimation for factor models of high-dimensional time series A robust and scalable estimation for high-dimensional volatility models Gaussian Approximation for High-Dimensional Second-Order $U$- and $V$-statistics with Size-Dependent Kernels under i.n.i.d. Sampling Double Local-to-Unity: Inference under Nearly Nonstationary Volatility Scalable Spatial Stream Network (S3N) Models Sparse covariate-driven factorization of high-dimensional brain connectivity with application to site effect correction De-Linearizing Agent Traces: Bayesian Inference of Latent Partial Orders for Efficient Execution DiPPER: A Bayesian approach to differential prevalence analysis with applications in microbiome studies Correcting for Nonignorable Nonresponse Bias in Ordinal Observational Survey Data Covariate-adjusted statistical dependence representation through partial copulas: bounds and new insights Variance Inference Beyond the Sandwich for Asymptotically Linear Estimators with Second-Order Remainders Refined Inference for Asymptotically Linear Estimators with Non-Negligible Second-Order Remainders Multiple-group (Controlled) Interrupted Time Series Analysis with Higher-Order Autoregressive Errors: A Simulation Study Comparing Newey-West and Prais-Winsten Methods Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness Learning Preferences from Conjoint Data: A Structural Deep Learning Approach Transversality and Geometric Regularisation in Distributional Statistical Models Polynomial Maximization Method with Fractional Polynomial Basis: A Frequentist Bridge to Bayesian Fractional Polynomials Estimation and Inference in Boundary Discontinuity Designs: Distance-Based Methods Flux-Preserving Adaptive Finite State Projection for Multiscale Stochastic Reaction Networks Two-way Clustering Robust Variance Estimator in Quantile Regression Models Estimation and Inference for the $τ$-Quantile of Individual Heterogeneous Coefficient SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing Confounder Detection via Treatment Intent: A New Observational Study Design Function-Valued Causal Influence in Nonlinear Time Series When Does LeJEPA Learn a World Model? Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data
Statistical Advantages of Oblique Randomized Decision Trees and Forests
Eliza O'Reilly · 2024-07-03 · via stat updates on arXiv.org

This work studies the statistical implications of using features comprised of general linear combinations of covariates to partition the data in randomized decision tree and forest regression algorithms. Using random tessellation theory in stochastic geometry, we provide a theoretical analysis of a class of efficiently generated random tree and forest estimators that allow for oblique splits along such features. We call these estimators oblique Mondrian trees and forests, as the trees are generated by first selecting a set of features from linear combinations of the covariates and then running a Mondrian process that hierarchically partitions the data along these features. Quadratic risk bounds and convergence rates are obtained for the flexible function class of multi-index models for dimension reduction, where the output is assumed to depend on a low-dimensional relevant feature subspace of the input domain. The results highlight how the risk of these estimators depends on the choice of features and quantify how robust the risk is with respect to error between the selected features along which the data is split and the true relevant feature subspace. The asymptotic analysis also provides conditions on the convergence rate a set of estimated relevant features must satisfy for oblique Mondrian estimators to obtain minimax optimal rates of convergence with respect to the dimension of the relevant feature subspace. Additionally, a lower bound on the risk of axis-aligned Mondrian trees (where features are restricted to the set of covariates) is obtained, proving that these estimators are suboptimal for general ridge functions, no matter how the distribution over the covariates used to divide the data at each tree node is weighted.