Full Citation
Title: Learning From Query-Answers: A Scalable Approach to Belief Updating and Parameter Learning
Citation Type: Journal Article
Publication Year: 2018
ISBN:
ISSN:
DOI: 10.1145/3277503
NSFID:
PMCID:
PMID:
Abstract: Tuple-independent and disjoint-independent probabilistic databases (TI-and DI-PDBs) represent uncertain data in a factorized form as a product of independent random variables that represent either tuples (TI-PDBs) or sets of tuples (DI-PDBs). When the user submits a query, the database derives the marginal probabilities of each output-tuple, exploiting the underlying assumptions of statistical independence. While query processing in TI-and DI-PDBs has been studied extensively, limited research has been dedicated to the problems of updating or deriving the parameters from observations of query results. Addressing this problem is the main focus of this paper. We first introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. The key idea of B-PDBs is to treat each parameter as a latent, Beta-distributed random variable. We show how this simple expedient enables both belief updating and parameter learning in a principled way, without imposing any burden on regular query processing. Building on B-PDBs, we then introduce Dirichlet Probabilistic Databases (D-PDBs), a generalization of DI-PDBs with similar properties. We provide the following key contributions for both Band D-PDBs: (i) we study the complexity of performing Bayesian belief updates and devise efficient algorithms for certain tractable classes of queries; (ii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iii) we present an algorithm for efficiently computing conditional probabilities, allowing us to efficiently implement Band D-PDBs via a standard relational engine; and (iv) we support our conclusions with extensive experimental results.
Url: https://doi.org/10.1145/3277503
User Submitted?: No
Authors: Meneghetti, Niccolò; Kennedy, Oliver; Gatterbauer, Wolfgang
Periodical (Full): ACM Transactions on Database Systems
Issue: 4
Volume: 43
Pages: 41
Data Collections: IPUMS USA
Topics: Methodology and Data Collection, Other
Countries: