IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Fast Mining of Distance-based Outliers in High-dimensional Datasets

Citation Type: Journal Article

Publication Year: 2008

Abstract: Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude.

User Submitted?: No

Authors: Parthasarathy, Srinivasan; Otey, Matthew Eric; Ghoting, Amol

Periodical (Full): Data Mining and Knowledge Discovery

Issue: 3

Volume: 16

Pages: 349-364

Data Collections: IPUMS USA

Topics: Methodology and Data Collection, Other

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop