IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: A Quadratic Mean based Supervised Learning Model for Managing Data Skewness

Citation Type: Miscellaneous

Publication Year: 2011

Abstract: In this paper, we study the problem of data skewness. A data set is skewed/imbalanced if its dependent variable is asymmetrically distributed. Dealing with skewed data sets has been identified as one of the ten most challenging problems in data mining research. We address the problem of class skewness for su- pervised learning models which are based on optimiz- ing a regularized empirical risk function. These include both classification and regression models for discrete and continuous dependent variables. Classical empirical risk minimization is akin to minimizing the arithmetic mean of prediction errors, in which approach the in- duction process is biased towards the majority class for skewed data. To overcome this drawback, we propose a quadratic mean based learning framework (QMLearn) that is robust and insensitive to class skewness. We will note that minimizing the quadratic mean is a con- vex optimization problem and hence can be efficiently solved for large and high dimensional data. Comprehen- sive experiments demonstrate that the QMLearn model significantly outperforms existing statistical learners in- cluding logistic regression, support vector machines, lin- ear regression, support vector regression and quantile regression etc.

Url: https://epubs.siam.org/doi/pdf/10.1137/1.9781611972818.17

User Submitted?: No

Authors: Liu, Wei; Chawla, Sanjay

Publisher: SIAM

Data Collections: IPUMS USA

Topics: Population Data Science

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop