Publications, working papers, and other research using data resources from IPUMS.

Eldawy, Ahmed; Haynes, David; Niu, Lyuye; Su, Zhiba 2017. Large Scale Analytics of Vector+Raster Big Spatial Data.

Significant increases in the volume of big spatial data have driven researchers and practitioners to build specialized systems to process and analyze this data. Existing efforts focus on either big raster data, e.g., remote sensing data or medical images, or big vector data, e.g., geotagged tweets or trajectories. However, when raster and vector data mix, one dataset must be converted to the other representation requiring vector-to-raster or raster-to-vector transformation before processing, which is extremely inefficient for large datasets. In this paper, we advocate a third approach that mixes the raw representations of both vector and raster data in the query processor. As a case study, we apply this to the zonal statistics problem, which computes the statistics over a raster layer for each polygon in a vector layer. We propose a novel method, called Scanline method, which does not require a conversion between raster and vector. Experimental evaluation on real datasets as large as 840 billion pixels shows up to three orders of magnitude speedup over the baseline methods.