Full Citation
Title: Learning Analysis Patterns using a Contextual Edit Distance
Citation Type: Working Paper
Publication Year: 2020
ISBN:
ISSN:
DOI:
NSFID:
PMCID:
PMID:
Abstract: This paper presents a proposal for learning users’ behavior patterns when they interactively analyse data. Users’ explorations (sequences of queries) are compared looking for subsequences of common actions or operations performed by the users during data analysis. We use a hierarchical clustering algorithm to retrieve groups of similar explorations. The main difficulty is to devise a similarity measure suitable to measure similarities between sequences of human actions. We propose to use a Contextual Edit Distance (CED), a generalization of Edit Distance that manages context-dependent edition costs. CED compares two users’ explorations, making special emphasis in the similarity of queries with nearby queries in the exploration, which determines a local context. We test our approach on three workloads of real users’ explorations, extracting common analysis patterns, both in explorations devised by students and expert analysts. We also experiment on an artificial workload, generated with CubeLoad [19], showing that our approach is able to identify the patterns imposed by the generator. To the best of our knowledge, this is the first attempt to characterize human analysis behavior in workloads of data explorations.
Url: http://ceur-ws.org/Vol-2572/paper17.pdf
User Submitted?: No
Authors: Moreau, Clement; Peralta, Veronika; Marcel, Patrick; Chanson, Alexandre; Devogele, Thomas
Series Title: DOLAP Working Paper Series
Publication Number: 17
Institution: DOLAP
Pages:
Publisher Location:
Data Collections: IPUMS USA
Topics: Other, Population Data Science
Countries: