Full Citation
Title: Near Linear Time Detection of Distance-Based Outliers and Applications to Security
Citation Type: Conference Paper
Publication Year: 2003
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: Many automated systems for detecting threats are based on matching a new database record to known attack types. However, this approach can only spot known threats and thus researchers have also begun to use unsupervised approaches based on detecting outliers or anomalous examples. A popular method of finding these outliers is to use the distance to an example's k nearest neighbors as a measure of unusualness. However, existing algorithms for finding distance-based outliers have poor scaling properties, making it difficult to apply them to large datasets typically available in security domains. In this paper, we propose modifications to a simple, but quadratic, algorithm for finding distance-based outliers, and show that it achieves near linear time scaling allowing it to be applied to real data sets with millions of examples and many features.
User Submitted?: No
Authors: Schwabacher, Mark; Bay, Stephen D.
Conference Name: Workshop on Data Mining for Counter Terrorism and Security
Publisher Location: San Francisco, CA
Data Collections: IPUMS USA
Topics: Methodology and Data Collection, Other
Countries: