Quantitative Methods Forum

When:
November 18, 2013 @ 10:15 AM – 11:45 AM
2013-11-18T10:15:00-05:00
2013-11-18T11:45:00-05:00
Where:
Norm Endler Seminar Room (BSB 164)
Cost:
Free

oldfordSpeaker: Dr. Wayne Oldford, University of Waterloo 
                 Department of Statistics & Actuarial Science

Title: Visual exploration of high-dimensional data by interactive navigation of low-dimensional data spaces.

Abstract: The structure of a set of high dimensional data objects (e.g. images, documents, molecules, genetic expressions, etc.) is notoriously difficult to visualize. In contrast, lower dimensional structure (esp. 3 or fewer dimensions) is natural to us and easy to visualize. A not unreasonable approach, then, might be to explore one low dimensional visualization after another in the hope that, together, these will shed light on the higher dimensional structure.
          In this talk, I will introduce some graph-theoretic structures which have low dimensional spaces as nodes/vertices and transitions from one space to another as edges. To be concrete, suppose that each node is a 2-d scatterplot of the data and that an edge exists between nodes whose corresponding scatterplots share a variable. In this case, travel along an edge amounts to a 3d transition effected by rotating one 2d scatterplot into the next. More generally, imagine a user moving a "You are here" circle, or "bullet", from one node to another along defined edges, causing one data visualization to be smoothly morphed into the other. A walk on the graph represents a low-dimensional trajectory through the higher dimensional space. Of interest, are walks along these graphs that reveal meaningful structure in the displayed data.
         These ideas will be demonstrated by undertaking a visual cluster analysis of an 8-dimensional data set, as well as exploration of image data (560 dimensions) after some dimensionality reduction methods have been applied.
         Methods for constructing these navigation graphs and for identifying interesting subgraphs will also be described and demonstrated. Some dimensionality reduction (manifold learning) methods will also be used to constrain the size of the graph.
        All methods are implemented in an interactive R package called RnavGraph, available for download from the CRAN website (http://cran.r-project.org/). Should time and interest permit, some live demonstration will be given of RnavGraph.

Suggested Readings:        
        Hurley, C. B., & Oldford, R. W. (2011). Eulerian tour algorithms for data visualization and the PairViz package. Computational Statistics, 26(4), 613-633.        
        Hurley, C. B. & Oldford, R. W. (2011). Graphs as navigational infrastrucure for high dimensional data spaces. Computational Statistics, 26(4), 585-612.         
        Hurley, C. B., & Oldford, R. W. (2010). Pairwise display of high-dimensional information via eulerian tours and hamiltonian decompositions. Journal of Computational and Graphical Statistics, 19, 861-866.       
        Oldford, R. W., & Waddell, A. (2011). Visual Clustering of High-dimensional Data by Navigating Low-dimensional Spaces. In 58th Congress of the International Statistical Institute, Special Topics Session (Vol. 57).         
        Waddell, A. & Oldford, W. (n.d.) Visual clustering of high-dimensional data. Poster.        
        Wilkinson, L., Anand, A., & Grossman, R. L. (2005, October). Graph-Theoretic Scagnostics. In INFOVIS (Vol. 5, p. 21).

Installation instructions for the RnavGraph package are available here