IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: A Comparative Study of Discretization Methods for Naive-Bayes Classifiers

Citation Type: Conference Paper

Publication Year: 2002

Abstract: Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes clas-sifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), non-disjoint discretization (NDD) and weighted proportional k-interval dis-cretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretiza-tion (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.

Url: http://users.monash.edu/~webb/Files/YangWebb02a.pdf

User Submitted?: No

Authors: Yang, Ying; Webb, Geoffrey I

Conference Name: Pacific Rim Knowledge Acquisition Workshop

Publisher Location: Tokyo

Data Collections: IPUMS USA

Topics: Other

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop