I was first introduced to corrgrams in this post by Tal Gallil on the Cross Validated site. Corrgrams are visualization examples developed by Michael Friendly used to visualize large correlation matrices. I have developed a few examples using SPSS base graphics to mimic some of the corrgrams Friendly presents, in particular a heat-map and proportional sized dot plot. I’ve posted the syntax to produce these graphics at the SPSS developer forum in this thread.

Some other extensions could be made in base graphics fairly easily, such as the diagonal hashings in the heat-map, but some others would take more thought (such as plotting different graphics in the lower and upper diagonal, or sorting the elements in the matrix by some other criterion). I think this is a good start though, and I particularly like the ability to super-impose the actual correlations as labels on the chart, like how it is done in this example on Cross Validated. It should satisfy both the graph people and the table people! See this other brief article by Michael Friendly and Ernest Kwan (2011) (which is initially in response to Gelman, 2011) and this post by Stephen Few to see what I am talking about.

One of the limitations of these visualizations is that it simply plots the bi-variate correlation. Friendly has one obvious extension in in the corrgram paper when he plots the bi-variate ellipses and loess smoother line. Other potential readings of interest that go beyond correlations may be examining scagnostic characteristics of distributions (Wilkinson & Wills, 2008) or utilizing other metrics that capture non-linear associations, such as the recent MIC statistic proposed in Reshef et al. (2011). All of these are only applicable to bi-variate associations.

**Citations:**

- Friendly, Michael. 2002. Corrgrams.
*The American Statistician*56(4): 316-324. PDF - Friendly, Michael & Ernest Kwan. 2011. Comment on Why tables are really much better than graphs.
*Journal of Computational and Graphical Statistics*20(1): 18-27. PDF available from publisher - Gelman, Andrew. 2011. Why tables are really much better than graphs. Journal of Computational and Graphical Statistics 20(1): 3-7. PDF available from publisher
- Reshef, David N., Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher & Pardis C. Sabeti. Detecting novel associations in large data sets.
*Science*334 (6062), 1518-1524. See this Cross Validated question for links to ungated documents, other resources and interesting discussion. - Wilkinson, Leland & Graham Wills. 2008. Scagnostic Distributions.
*Journal of Computational and Graphical Statistics*17(2): 473-491. This PDF is not available online, but another one that introduces the concept is (Wilkinson et al., 2005)

## Bruce Weaver

/ January 15, 2014Neat stuff, Andy. Have you figured out a way to make a corrgram with pac-man figures (as in the Friendly article)? With pac-man figures, one could potentially add a couple of radii (perhaps as dashed lines) to display the confidence interval for rho.

Cheers!

Bruce

## apwheele

/ January 15, 2014Hi Bruce – thanks for the comment,

With those examples you wouldn’t be able to do that (currently) in SPSS. The simplest way to build the little pies would be to build custom glyphs that corresponded to certain pie shapes (e.g. they would be similar to choosing whether to use a circle or a square symbol for a scatterplot). See Funny Faces: Visualizing Many Variables for an example of making custom glyphs. I don’t expect such customization to be available for SPSS graphs though anytime soon.

If I changed the examples to use small multiple graphics, then it would be possible to build up the geometries to make the pies, but it would be a chore. I’ve developed some code to do the ellipses, but I think maybe a better way to simplify the plots would be to make a line and error bands.

I think I have mixed feelings though whether error bands are useful for these types of charts. Since the correlation matrix is typically made from the same dataset, the standard error of the intervals should be pretty similar for all of the estimates (ignoring the most extreme correlations). So it basically ends up being redundant info with displaying the correlation itself.

One way to incorporate a confidence interval in the example on the right would be to make two circles, one for the lower and one for the upper – although that adds a layer of complexity to a plot that can get a bit complicated already.