Rare-Event Prediction in Imbalanced Data: A Unified Evaluation and Optimization Framework for High-Risk Systems
Abstract
: Rare events, outcomes that occur infrequently but often carry high stakes, present a major challenge for predictive modeling due to extreme class imbalance. When the majority class vastly outnumbers the minority class, standard machine learning models can achieve deceptively high overall accuracy by simply predicting the common outcome. This imbalance can mask poor performance on the rare event of interest; for example, in a dataset with 0.1% event prevalence, a trivial classifier that predicts "no event" for every case attains ~99.9% accuracy yet fails to detect any true events. To address this, researchers have developed a spectrum of techniques for rare-event prediction, including data-level resampling (oversampling minority cases, undersampling the majority, and synthetic data generation), algorithm-level methods such as cost-sensitive learning and adjusted decision thresholds, and ensemble approaches tailored for imbalance. Evaluating rare-event models also requires special consideration: traditional metrics like overall accuracy are insufficient, and metrics emphasizing the minority class—precision, recall, F1-score, area under the precision-recall curve (AUPRC), and Matthew’s correlation coefficient (MCC)—are more informative. This review synthesizes recent advances in rare-event prediction across diverse domains, from healthcare and industrial safety to finance, cybersecurity, and transportation. We discuss the challenges posed by imbalanced safety and health datasets, compare strategies to mitigate class imbalance, examine appropriate evaluation metrics, and highlight case studies in multiple fields. Drawing on best practices from the literature, we propose a unified framework for evaluating rare-event prediction models that can guide machine learning researchers, public health experts, and safety engineers in developing robust, generalizable models for low-frequency yet critical outcomes.



