Full Citation
Title: Sampling Strategies for Finding Frequent Sets
Citation Type: Miscellaneous
Publication Year: 2002
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: There are many deterministic algorithms that perform the task of finding frequent itemsets in a database. All of them depend somehow on the size of the database, and when this size increases, the computational cost of the algorithms becomes higher. Sampling strategies are a way to overcome this problem. The classical probabilistic approaches for finding frequent sets have been using Chernoff bounds and equivalents to calculate the size of the sample although the result always overestimates. One of the proposed alternatives is online sampling, which adaptatively calculates the size of the sample for each itemset. But this technique has been proved difficult to implement using the current algorithmic schemes for finding frequent itemsets. In this paper we present different implementations of the online scheme using our previous best-first proposal, which meets the requirements needed by online sampling. We compare these proposals with the batch one. Our final approach shows a stable behaviour in several different situations.
Url: http://users-cs.au.dk/gerth/alcom-ft/TR/ALCOMFT-TR-02-6.ps.gz
User Submitted?: No
Authors: Baixeries, J; Casas Garriga, G
Publisher: Departament de LSI
Data Collections: IPUMS USA
Topics: Population Data Science
Countries: United States