Full Citation
Title: Discovering Interesting Interrelationships with Undiscretized Quantitative Attributes in Large, Dense Databases
Citation Type: Dissertation/Thesis
Publication Year: 2005
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: Exploratory rule discovery is widely employed in real-world data mining, because of the flexibility in selecting applicable models. Nevertheless, two problems coexist with the merits of exploratory rule discovery. One of these drawbacks is how to limit within reasonable bounds the number of resulting models. The other problem is how to improve the efficiency of rule discovery by eliminating unnecessary computation and I/O. Techniques for tackling these issues have been studied extensivelyin the context of exploratory rule discovery with qualitative attributes. However, databases processed often involve quantitative attributes. Some researchers strive to introduce quantitative attributes into exploratory rule discovery by discretization, with which information loss is unavoidable. Such techniques are not optimal for mining inter-relationships between quantitative attributes and qualitative attributes. A special class of exploratory rule discovery has been proposed for mining rules with consequents being one or more undiscretized target quantitative variables.Characteristics of the selected quantitative variables are described using distributional statistics. However, previous techniques for mining exploratory rules with undiscretized quantitative targets cannot efficiently search for rules in verylarge, dense databases. Rule pruning techniques in this context are also limited. The only investigation was the pruning of insignificant quantitative association rules proposed by Aumann and Lindell (1999). Efficiency is one of the critical issues for such techniques.Accordingly, we propose techniques for pruning rules with undiscretized quantitative attributes. We call these techniques the derivative extended rule filter and the derivative partial rule filter. The derivative extended rule filter is an efficientvariant of the existing insignificant quantitative association rule pruning proposed by Aumann and Lindell (1999). The derivative partial rule filter is able to remove potentially uninteresting rules that remain after the derivative extended rule filter is applied. We also discovered severe efficiency problems in existing rule pruningtechniques with undiscretized quantitative attributes. The triviality filter is then suggested as a complement for the derivative extended rule filter, whose antimonotonicity can be utilized for more powerful search space pruning. We also propose the difference set statistics derivation and the circular intersection approaches for lessening the redundancies of data accesses and computations in our original implementation of derivative rule filters. Detailed experimental evaluations are committed to back up our arguments for desirable performance expectations with the above techniques.
User Submitted?: No
Authors: Huang, Shiying
Institution: Monash University
Department: Computer Science and Software Engineering
Advisor: Geoffrey I. Webb
Degree: Master of Information Technology
Publisher Location: Australia
Pages:
Data Collections: IPUMS USA
Topics: Methodology and Data Collection
Countries: