IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Sampling Strategies for Finding Frequent Sets

Citation Type: Miscellaneous

Publication Year: 2002

Abstract: There are many deterministic algorithms that perform the task of finding frequent itemsets in a database. All of them depend somehow on the size of the database, and when this size increases, the computational cost of the algorithms becomes higher. Sampling strategies are a way to overcome this problem. The classical probabilistic approaches for finding frequent sets have been using Chernoff bounds and equivalents to calculate the size of the sample although the result always overestimates. One of the proposed alternatives is online sampling, which adaptatively calculates the size of the sample for each itemset. But this technique has been proved difficult to implement using the current algorithmic schemes for finding frequent itemsets. In this paper we present different implementations of the online scheme using our previous best-first proposal, which meets the requirements needed by online sampling. We compare these proposals with the batch one. Our final approach shows a stable behaviour in several different situations.

Url: http://users-cs.au.dk/gerth/alcom-ft/TR/ALCOMFT-TR-02-6.ps.gz

User Submitted?: No

Authors: Baixeries, J; Casas Garriga, G

Publisher: Departament de LSI

Data Collections: IPUMS USA

Topics: Population Data Science

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop