Full Citation
Title: Fast Mining of Distance-Based Outliers in High Dimensional Datasets
Citation Type: Miscellaneous
Publication Year: 2005
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distancebased outlier detection algorithms. Several of these efforts report log-linear time performance as a function of the number of data points on many real life low dimensional datasets. However, these same algorithms are unable to obtain the same level of performance on high dimensional data sets since the scaling behavior is exponential in the number of dimensions. In this paper we present RBRP, a fast algorithm for mining distancebased outliers, particularly targeted at high dimensional data sets. RBRP is expected to scale log-linearly, as a function of the number of data points and scales linearly as a function of the number of dimensions. Our empirical evaluation verifies this expectancy and furthermore we demonstrate that our approach consistently outperforms the state-of-the-art, sometimes by an order of magnitude, on several real and synthetic datasets.
Url: ftp://ftp.cse.ohio-state.edu/pub/tech-report/2005/TR71.pdf
User Submitted?: No
Authors: Ghoting, Amol; Parthasarathy, Srinivasan; Otey, Matthew Eric
Publisher: The Ohio State University
Data Collections: IPUMS USA
Topics: Methodology and Data Collection, Other
Countries: