IPUMS.org Home Page

BIBLIOGRAPHY

Publications, working papers, and other research using data resources from IPUMS.

Full Citation

Title: Learning Analysis Patterns using a Contextual Edit Distance

Citation Type: Working Paper

Publication Year: 2020

Abstract: This paper presents a proposal for learning users’ behavior patterns when they interactively analyse data. Users’ explorations (sequences of queries) are compared looking for subsequences of common actions or operations performed by the users during data analysis. We use a hierarchical clustering algorithm to retrieve groups of similar explorations. The main difficulty is to devise a similarity measure suitable to measure similarities between sequences of human actions. We propose to use a Contextual Edit Distance (CED), a generalization of Edit Distance that manages context-dependent edition costs. CED compares two users’ explorations, making special emphasis in the similarity of queries with nearby queries in the exploration, which determines a local context. We test our approach on three workloads of real users’ explorations, extracting common analysis patterns, both in explorations devised by students and expert analysts. We also experiment on an artificial workload, generated with CubeLoad [19], showing that our approach is able to identify the patterns imposed by the generator. To the best of our knowledge, this is the first attempt to characterize human analysis behavior in workloads of data explorations.

Url: http://ceur-ws.org/Vol-2572/paper17.pdf

User Submitted?: No

Authors: Moreau, Clement; Peralta, Veronika; Marcel, Patrick; Chanson, Alexandre; Devogele, Thomas

Series Title: DOLAP Working Paper Series

Publication Number: 17

Institution: DOLAP

Pages:

Publisher Location:

Data Collections: IPUMS USA

Topics: Other, Population Data Science

Countries:

IPUMS NHGIS NAPP IHIS ATUS Terrapop