















Abstract:In many classification settings, the class of primary interest is underrepresented, leading to imbalanced data problems that arise in applications such as rare disease detection and fraud identification. In these contexts, identifying a potential positive instance typically triggers costly follow-up actions, such as medical imaging or detailed transaction inspection, which are subject to limited operational capacity. Motivated by this setting, we consider classification problems where data may arrive sequentially and decisions must be made under constraints on the number of instances that can be selected for further analysis. We propose a classification framework that explicitly controls the rate of positive predictions, enforcing a user-defined bound on the proportion of observations classified as belonging to the minority class while maximizing detection performance. The approach can be implemented using standard learning methods and naturally extends to online settings, where decisions are taken in real time. We show that incorporating capacity constraints leads to substantial improvements over classical approaches, including resampling techniques such as SMOTE, which do not directly control the selection rate.
| Subjects: | Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST) |
| MSC classes: | 62H30, 68T05, 62C20, 62G20 |
| Cite as: | arXiv:2605.03289 [stat.ML] |
| (or arXiv:2605.03289v1 [stat.ML] for this version) | |
| https://doi.org/10.48550/arXiv.2605.03289 arXiv-issued DOI via DataCite (pending registration) |
From: Daniel Fraiman [view email]
[v1]
Tue, 5 May 2026 02:21:01 UTC (293 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。