IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Fast Mining of Distance-Based Outliers in High Dimensional Datasets

Citation Type: Miscellaneous

Publication Year: 2005

Abstract: Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distancebased outlier detection algorithms. Several of these efforts report log-linear time performance as a function of the number of data points on many real life low dimensional datasets. However, these same algorithms are unable to obtain the same level of performance on high dimensional data sets since the scaling behavior is exponential in the number of dimensions. In this paper we present RBRP, a fast algorithm for mining distancebased outliers, particularly targeted at high dimensional data sets. RBRP is expected to scale log-linearly, as a function of the number of data points and scales linearly as a function of the number of dimensions. Our empirical evaluation verifies this expectancy and furthermore we demonstrate that our approach consistently outperforms the state-of-the-art, sometimes by an order of magnitude, on several real and synthetic datasets.

Url: ftp://ftp.cse.ohio-state.edu/pub/tech-report/2005/TR71.pdf

User Submitted?: No

Authors: Ghoting, Amol; Parthasarathy, Srinivasan; Otey, Matthew Eric

Publisher: The Ohio State University

Data Collections: IPUMS USA

Topics: Methodology and Data Collection, Other

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop