Full Citation
Title: A Quadratic Mean based Supervised Learning Model for Managing Data Skewness
Citation Type: Miscellaneous
Publication Year: 2011
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: In this paper, we study the problem of data skewness. A data set is skewed/imbalanced if its dependent variable is asymmetrically distributed. Dealing with skewed data sets has been identified as one of the ten most challenging problems in data mining research. We address the problem of class skewness for su- pervised learning models which are based on optimiz- ing a regularized empirical risk function. These include both classification and regression models for discrete and continuous dependent variables. Classical empirical risk minimization is akin to minimizing the arithmetic mean of prediction errors, in which approach the in- duction process is biased towards the majority class for skewed data. To overcome this drawback, we propose a quadratic mean based learning framework (QMLearn) that is robust and insensitive to class skewness. We will note that minimizing the quadratic mean is a con- vex optimization problem and hence can be efficiently solved for large and high dimensional data. Comprehen- sive experiments demonstrate that the QMLearn model significantly outperforms existing statistical learners in- cluding logistic regression, support vector machines, lin- ear regression, support vector regression and quantile regression etc.
Url: https://epubs.siam.org/doi/pdf/10.1137/1.9781611972818.17
User Submitted?: No
Authors: Liu, Wei; Chawla, Sanjay
Publisher: SIAM
Data Collections: IPUMS USA
Topics: Population Data Science
Countries: United States