IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Discovering Significant Patterns

Citation Type: Journal Article

Publication Year: 2007

DOI: 10.1007/s10994-007-5006-x

Abstract: Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some user-specified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type-1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying well-established statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to real-world data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.

Url: http://link.springer.com/10.1007/s10994-007-5006-x

User Submitted?: No

Authors: Webb, Geoffrey I.

Periodical (Full): Machine Learning

Issue: 1

Volume: 68

Pages: 1-33

Data Collections: IPUMS USA

Topics: Population Data Science

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop