Journal of Statistical Software - PDF

Please download to get full document.

View again

of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Industry

Published:

Views: 2 | Pages: 18

Extension: PDF | Download: 0

Share
Related documents
Description
JSS Journal of Statistical Software May 2012, Volume 48, Issue 4. qgraph: Network Visualizations of Relationships in Psychometric Data Sacha Epskamp University of Amsterdam Angélique
Transcript
JSS Journal of Statistical Software May 2012, Volume 48, Issue 4. qgraph: Network Visualizations of Relationships in Psychometric Data Sacha Epskamp University of Amsterdam Angélique O. J. Cramer University of Amsterdam Lourens J. Waldorp University of Amsterdam Verena D. Schmittmann University of Amsterdam Denny Borsboom University of Amsterdam Abstract We present the qgraph package for R, which provides an interface to visualize data through network modeling techniques. For instance, a correlation matrix can be represented as a network in which each variable is a node and each correlation an edge; by varying the width of the edges according to the magnitude of the correlation, the structure of the correlation matrix can be visualized. A wide variety of matrices that are used in statistics can be represented in this fashion, for example matrices that contain (implied) covariances, factor loadings, regression parameters and p values. qgraph can also be used as a psychometric tool, as it performs exploratory and confirmatory factor analysis, using sem and lavaan; the output of these packages is automatically visualized in qgraph, which may aid the interpretation of results. In this article, we introduce qgraph by applying the package functions to data from the NEO-PI-R, a widely used personality questionnaire. Keywords: R, networks, correlations, data visualization, factor analysis, graph theory. 1. Introduction The human visual system is capable of processing highly dimensional information naturally. For instance, we can immediately spot suggestive patterns in a scatterplot, while these same patterns are invisible when the data is numerically represented in a matrix. We present qgraph, a package for R (R Development Core Team 2012), available from the Comprehensive R Archive Network at qgraph accommodates capacities for spotting patterns by visualizing data in a novel way: through networks. Networks consist of nodes (also called vertices ) that are connected by edges 2 qgraph: Network Visualizations of Relationships in Psychometric Data (Harary 1969). Each edge has a certain weight, indicating the strength of the relevant connection, and in addition edges may or may not be directed. In most applications of network modeling, nodes represent entities (e.g., people in social networks, or genes in gene networks). However, in statistical analysis it is natural to represent variables as nodes. This representation has a longstanding tradition in econometrics and psychometrics (e.g., see Bollen and Lennox 1991; Edwards and Bagozzi 2000), and was a driving force behind the development of graphical models for causal analysis (Spirtes et al. 2000; Pearl 2000). By representing relationships between variables (e.g., correlations) as weighted edges important structures can be detected that are hard to extract by other means. In general, qgraph enables the researcher to represent complex statistical patterns in clear pictures, without the need for data reduction methods. qgraph was developed in the context of network approaches to psychometrics (Cramer et al. 2010; Borsboom 2008; Schmittmann et al. 2012), in which theoretical constructs in psychology are hypothesized to be networks of causally coupled variables. In particular, qgraph automates the production of graphs such as those proposed in Cramer et al. (2010). However, the techniques in the package have also proved useful as a more general tool for visualizing data, and include methods to visualize output from several psychometric packages like sem (Fox 2006; Fox et al. 2012) and lavaan (Rosseel 2012). A number of R packages can be used for the visualization and analysis of networks, e.g., network (Butts et al. 2012; Butts 2008a), statnet (Handcock et al. 2008), igraph (Csardi and Nepusz 2006). In visualizing graphs qgraph distinguishes itself by being specifically aimed at the visualization of statistical information. This usually leads to a special type of graph: a non-sparse weighted graph. Such graphs typically contain many edges (e.g., a fully connected network with 50 nodes has 2450 edges) thereby making it hard to interpret the graph; as well as inflating the file size of vector type image files (e.g., PDF, SVG, EPS). qgraph is specifically aimed at presenting such graphs in a meaningful way (e.g., by using automatic scaling of color and width, cutoff scores and ordered plotting of edges) and to minimize the file size of the output in vector type image files (e.g., by minimizing the amount of polygons needed). Furthermore, qgraph is designed to be usable by researchers new to R, while at the same time offering more advanced customization options for experienced R users. qgraph is not designed for numerical analysis of graphs (Boccaletti et al. 2006), but can be used to compute the node centrality measures of weighted graphs proposed by Opsahl et al. (2010). Other R packages as well as external software can be used for more detailed analyses. qgraph facilitates these methods by using commonly used methods as input. In particular, the input is the same as used in the igraph package for R, which can be used for many different analyses. In this article we introduce qgraph using an included dataset on personality traits. We describe how to visualize correlational structures, factor loadings and structural equation models and how these visualizations should be interpreted. Finally we will show how qgraph can be used as a simple unified interface to perform several exploratory and confirmatory factor analysis routines available in R. 2. Creating graphs Throughout this article we will be working with a dataset concerning the five factor model of personality (Benet-Martinez and John 1998; Digman 1989; Goldberg 1990, 1998; McCrae and Journal of Statistical Software 3 Costa 1997). This is a model in which correlations between responses to personality items (i.e., questions of the type do you like parties?, do you enjoy working hard? ) are explained by individual differences in five personality traits: neuroticism, extraversion, agreeableness, openness to experience and conscientiousness. These traits are also known as the Big Five. We use an existing dataset in which the Dutch translation of a commonly used personality test, the NEO-PI-R (Costa and McCrae 1992; Hoekstra et al. 2003), was administered to 500 first year psychology students (Dolan et al. 2009). The NEO-PI-R consists of 240 items designed to measure the five personality factors with items that cover six facets per factor 1. The scores of each subject on each item are included in qgraph, as well as information on the factor each item is designed to measure (this information is in the column names). All graphs in this paper were made using R version ( ) and qgraph version First, we load qgraph and the NEO-PI-R dataset: R library( qgraph ) R data( big5 ) 2.1. Input modes The main function of qgraph is called qgraph(), and its first argument is used as input for making the graph. This is the only mandatory argument and can either be a weights matrix, an edge-list or an object of class qgraph , loadings and factanal (stats; R Development Core Team 2012), principal (psych; Revelle 2012), sem and semmod (sem; Fox et al. 2012), lavaan (lavaan; Rosseel 2012), graphnel (Rgraphviz; Gentry et al. 2012) or pcalgo (pcalg; Kalisch et al. 2012). In this article we focus mainly on weights matrices, information on other input modes can be found in the documentation. A weights matrix codes the connectivity structure between nodes in a network in matrix form. For a graph with n nodes its weights matrix A is a square n by n matrix in which element a ij represents the strength of the connection, or weight, from node i to node j. Any value can be used as weight as long as (a) the value zero represents the absence of a connection, and (b) the strength of connections is symmetric around zero (so that equal positive and negative values are comparable in strength). By default, if A is symmetric an undirected graph is plotted and otherwise a directed graph is plotted. In the special case where all edge weights are either 0 or 1 the weights matrix is interpreted as an adjacency matrix and an unweighted graph is made. For example, consider the following weights matrix: This matrix represents a graph with 3 nodes with weighted edges from node 1 to nodes 2 and 3, and from node 2 to node 3. The resulting graph is presented in Figure 1. Many statistics follow these rules and can be used as edge weights (e.g., correlations, covariances, regression parameters, factor loadings, log odds). Weights matrices themselves also 1 A facet is a subdomain of the personality factor; e.g., the factor neuroticism has depression and anxiety among its subdomains. 4 qgraph: Network Visualizations of Relationships in Psychometric Data Figure 1: A directed graph based on a 3 by 3 weights matrix with three edges of different strengths Neuroticism Extraversion Openness Agreeableness Conscientiousness Figure 2: A visualization of the correlation matrix of the NEO-PI-R dataset. Each node represents an item and each edge represents a correlation between two items. Green edges indicate positive correlations, red edges indicate negative correlations, and the width and color of the edges correspond to the absolute value of the correlations: the higher the correlation, the thicker and more saturated is the edge. Journal of Statistical Software 5 occur naturally (e.g., as a correlation matrix) or can easily be computed. Taking a correlation matrix as the argument of the function qgraph() is a good start to get acquainted with the package. With the NEO-PI-R dataset, the correlation matrix can be plotted with: R qgraph(cor(big5)) This returns the most basic graph, in which the nodes are placed in a circle. The edges between nodes are colored according to the sign of the correlation (green for positive correlations, and red for negative correlations), and the thickness of the edges represents the absolute magnitude of the correlation (i.e., thicker edges represent higher correlations). Visualizations that aid data interpretation (e.g., are items that supposedly measure the same construct closely connected?) can be obtained either by using the groups argument, which groups nodes according to a criterion (e.g., being in the same psychometric subtest) or by using a layout that is sensitive to the correlation structure. First, the groups argument can be used to specify which nodes belong together (e.g., are designed to measure the same factor). Nodes belonging together have the same color, and are placed together in smaller circles. The groups argument can be used in two ways. First, it can be a list in which each element is a vector containing the numbers of nodes belonging together. Secondly, it can be a factor in which the levels belong together. The names of the elements in the list or the levels in the factor are used in a legend of requested. For the Big 5 dataset, the grouping of the variables according to the NEO-PI-R manual is included in the package. The result of using the groups argument is a network representation that readily facilitates interpretation in terms of the five personality factors: R data( big5groups ) R Q - qgraph(cor(big5), groups = big5groups) Note that we saved the graph in the object Q, to avoid specifying these arguments again in future runs. It is easy to subsequently add other arguments: for instance, we may further optimize the representation by using the minimum argument to omit correlations we are not interested in (e.g., very weak correlations), borders to omit borders around the nodes, and vsize to make the nodes smaller: R Q - qgraph(q, minimum = 0.25, borders = FALSE, vsize = 2) The resulting graph is represented in Figure Layout modes Instead of predefined circles (as was used in Figure 2), an alternative way of facilitating interpretations of correlation matrices is to let the placement of the nodes be a function of the pattern of correlations. Placement of the nodes can be controlled with the layout argument. If layout is assigned circular , then the nodes are placed clockwise in a circle, or in smaller circles if the groups argument is specified (as in Figure 2). If the nodes are placed such that the length of the edges depends on the strength of the edge weights (i.e., shorter edges for stronger weights), then a picture can be generated that shows how variables cluster. 6 qgraph: Network Visualizations of Relationships in Psychometric Data This is a powerful exploratory tool, that may be used as a visual analogue to factor analysis. To make the length of edges directly correspond to the edge weights an high dimensional space would be needed, but a good alternative is the use of force-embedded algorithms (Di Battista et al. 1994) that iteratively compute the layout in two-dimensional space. A modified version of a force-embedded algorithm that was proposed by Fruchterman and Reingold (1991) is included in qgraph. This is a C function that was ported from the sna package (Butts 2010, 2008b). A modification of this algorithm for weighted graphs was taken from igraph (Csardi and Nepusz 2006). This algorithm uses an iterative process to compute a layout in which the length of edges depends on the absolute weight of the edges. To use the Fruchterman-Reingold algorithm in qgraph() the layout argument needs to be set to spring . We can do this for the NEO-PI-R dataset, using the graph object Q that we defined earlier, and omitting the legend: R qgraph(q, layout = spring , legend = FALSE) Figure 3 shows the correlation matrix of the Big Five dataset with the nodes placed according to the Fruchterman-Reingold algorithm. This allows us inspect the clustering of the variables. The figure shows interesting structures that are far harder to detect with conventional analyses. For instance, neuroticism items (i.e., red nodes) cluster to a greater extent when compared to other traits; especially openness is less strongly organized than the other factors. In addition, agreeableness and extraversion items are literally intertwined, which offers a suggestive way of thinking about the well known correlation between these traits. The placement of the nodes can also be specified manually by assigning the layout argument a matrix containing the coordinates of each node. For a graph of n nodes this would be a n by 2 matrix in which the first column contains the x coordinates and the second column contains the y coordinates. These coordinates can be on any scale and will be rescaled to fit the graph by default. For example, the following matrix describes the coordinates of the graph in Figure 1: This method of specifying the layout of a graph is identical to the one used in the igraph (Csardi and Nepusz 2006) package, and thus any layout obtained through igraph can be used 2. One might be interested in creating not one graph but an animation of several graphs that are similar to each other. For example, to illustrate the growth of a network over time or to investigate the change in correlational structure in repeated measures. For such similar but not equal graphs the Fruchterman-Reingold algorithm might return completely different layouts which will make the animation unnecessary hard to interpret. This problem can be solved by limiting the amount of space a node may move in each iteration. The function qgraph.animate() automates this process and can be used for various types of animations. 2 To do this, first create an igraph object by calling graph.adjacency() on the weights matrix with the arguments weighted = TRUE. Then, use one of the layout functions (e.g., layout.spring()) on the igraph object. This returns the matrix with coordinates wich can be used in qgraph(). Journal of Statistical Software Figure 3: A graph of the correlation matrix of the NEO-PI-R dataset in which the nodes are placed by the Fruchterman-Reingold algorithm. The specification of the nodes and edges are identical to Figure Output modes To save the graphs, any output device in R can be used to obtain high resolution, publicationready image files. Some devices can be called directly by qgraph() through the filetype argument, which must be assigned a string indicating what device should be used. Currently filetype can be R or x11 3 to open a new plot in R, raster types tiff , png and jpg , vector types eps , pdf and svg and tex . A PDF file is advised, and this can thus be created with qgraph(..., filetype = pdf ). 3 RStudio users are advised to use filetype = x11 to plot in R. 8 qgraph: Network Visualizations of Relationships in Psychometric Data Often, the number of nodes makes it potentially hard to track which variables are represented by which nodes. To address this problem, one can define mouseover tooltips for each node, so that the name of the corresponding variable (e.g., the item content in the Big Five graph) is shown when one mouses over the relevant node. In qgraph, mouseover tooltips can be placed on the nodes in two ways. The svg filetype creates a SVG image using the RSVGTipsDevice package (Plate and Luciani 2011) 4. This filetype can be opened using most browsers (best viewed in Firefox) and can be used to include mouseover tooltips on the node labels. The tooltips argument can be given a vector containing the tooltip for each node. Another option for mouseover tooltips is to use the tex filetype. This uses the tikzdevice package (Sharpsteen and Bracken 2012) to create a.tex file that contains the graph 5, which can then be built using pdflatex. The tooltips argument can also be used here to create mouseover tool tips in a PDF file Standard visual parameters In weighted graphs green edges indicate positive weights and red edges indicate negative weights 7. The color saturation and the width of the edges corresponds to the absolute weight and scale relative to the strongest weight in the graph (i.e., the edge with the highest absolute weight will have full color saturation and be the widest). It is possible to control this behavior by using the maximum argument: when maximum is set to a value above any absolute weight in the graph then the color and width will scale to the value of maximum instead 8. Edges with an absolute value under the minimum argument are omitted, which is useful to keep filesizes from inflating in very large graphs. In larger graphs the above edge settings can become hard to interpret. With the cut argument a cutoff value can be set which splits scaling of color and width. This makes the graphs much easier to interpret as you can see important edges and general trends in the same picture. Edges with absolute weights under the cutoff score will have the smallest width and become more colorful as they approach the cutoff score, and edges with absolute weights over the cutoff score will be full red or green and become wider the stronger they are. In addition to these standard arguments there are several arguments that can be used to graphically enhance the graphs to, for example, change the size and shape of nodes, add a background or venn diagram like overlay and visualize test scores of a subject on the graph. The documentation of the qgraph() function has detailed instructions and examples on how these can be used. 3. Visualizing statistics as graphs 3.1. Correlation matrices In addition to the representations in Figures 2 and 3, qgraph offers various other possibilities for visualizing association structures. If a correlation matrix is used as input, the graph 4 RSVGTipsDevice is only available for 32bit versions of R. 5 Note that this will load the tikzdevice package which upon loading checks for a L A TEX compiler. If this is not available the package might fail to load. 6 We would like to thank Charlie Sharpsteen for supplying the tikz codes for these tooltips. 7 The edge colors can currently
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks