IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: A Dataset Search Engine for the Research Document Corpus

Citation Type: Miscellaneous

Publication Year: 2012

Abstract: A key step in validating a proposed idea or system is to evaluate over a suitable data set. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find suitable datasets and their URLs, which is laborious and inefficient. To better aid the data discovery process, and provide a better understanding of how and where datasets have been used, we propose a framework to effectively identify datasets within the scientific corpus. The key technical challenges are identification of datasets, and discovery of the association between a dataset and the URLs where they can be accessed. Based on this, we have built a user friendly web-based search interface for users to conveniently explore the dataset-paper relationships, and find relevant datasets and their properties.

Url: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.4565&rep=rep1&type=pdf

User Submitted?: No

Authors: Lu, Meiyu; Bangalore, Srinivas; Cormode, Graham; Hadjieleftheriou, Marios; Srivastava, Divesh

Publisher: National University of Singapore

Data Collections: IPUMS USA

Topics: Population Data Science

Countries: United States

IPUMS NHGIS NAPP IHIS ATUS Terrapop