Visualizing the correlations between structured data features is of central importance for effective and efficient data analysis and decision-making. In this paper, we present a new unsupervised semi-structured and feature-based tool for interactive data visualization titled “mirrored dendrograms”. It accepts as input semi-structured and multi-featured data, and allows the user to select the target features to be visualized and mapped against each other, and their relative impacts (weights) on the visualization process. It then invokes a hierarchical clustering process to cluster the data following the user-chosen features, and produces a dendrogram structure for each combination of target features. The dendrograms are mirrored against each other by mapping their nodes using the transportation optimization problem. Different from existing solutions like tanglegram and cluster heatmap, mirrored dendrograms offers three main contributions: (i) connecting the dendrograms through their internal nodes to describe their structure relationships (instead of connecting their leaf nodes only), (ii) allowing to zoom-in and out of the data to show their relationships at different granularity levels (compared with existing static solutions), and (iii) identifying the best zooming level between the two dendrograms which highlights the maximum correlation with the minimal amount of details presented to the user (acquiring the most value out of the data, while viewing the least amount of data). We have evaluated our solution using multiple use case scenarios, including Electronic Health Records (EHRs), IMDB publications, IMDB movie entries, and Semantic SVG Graph (SSGs) instances. A number of 60 testers participated in quantitative and qualitative evaluations to assess the data visualization tool, compared with existing solutions namely tanglegrams and cluster heatmap. Testers evaluated visual quality by measuring (i) the time needed by a user to identify the matching features between two data entries, and (ii) the accuracy of the mapped features identified by the user. Two-sample t-tests were conducted to verify the statistical significance of the results obtained for the sample data groups being compared. A qualitative survey was also conducted to evaluate the tools’ usability, interactivity, and data zooming quality. Results are promising and highlight the tool’s quality and potential compared with its alternatives.
Moufarrej et al. (Fri,) studied this question.