Plotting panel data with many lines in SPSS

A quick blog post – so you all are not continually assaulted by my mug shot on the front page of the blog!

Panel data is complicated. When conducting univariate time series analysis, pretty much everyone plots the series. I presume people do not do this often for panel data because the charts tend to be more messy and less informative. But by using transparency and small multiple plots are easy places to start to unpack the information. Here I am going to show these using plots of arrest rates from 1970 through 2014 in New York state counties. The data and code can be downloaded here, and that zip file contains info. on where the original data came from. It is all publicly available – but mashing up the historical census data for the population counts by county is a bit of a pain.

So I will start with grabbing my created dataset, and then making a default plot of all the lines. Y axis is the arrest rate per 1,000 population, and the X axis are years.

*Grab the dataset.
FILE HANDLE data /NAME = "!!Your File Handle Here!!!".
GET FILE = "data\Arrest_WPop.sav".
DATASET NAME ArrestRates.

*Small multiple lines over time - default plot.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Year Total_Rate County 
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Year=col(source(s), name("Year"))
  DATA: Total_Rate=col(source(s), name("Total_Rate"))
  DATA: County=col(source(s), name("County"), unit.category())
  GUIDE: axis(dim(1), label("Year"))
  GUIDE: axis(dim(2), label("Total Arrest Rate per 1,000"))
  ELEMENT: line(position(Year*Total_Rate), split(County))
END GPL.

That is not too bad, but we can do slightly better by making the lines small and semi-transparent (which is the same advice for dense scatterplots):

*Make them transparent and smaller.
FORMATS Total_Rate (F2.0).
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Year Total_Rate County 
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Year=col(source(s), name("Year"))
  DATA: Total_Rate=col(source(s), name("Total_Rate"))
  DATA: County=col(source(s), name("County"), unit.category())
  GUIDE: axis(dim(1), label("Year"))
  GUIDE: axis(dim(2), label("Total Arrest Rate per 1,000"))
  SCALE: linear(dim(1), min(1970), max(2014))
  ELEMENT: line(position(Year*Total_Rate), split(County), transparency(transparency."0.7"), size(size."0.7"))
END GPL.

This helps disentangle the many lines bunched up. There appear to be two outliers, and basically the rest of the pack.

A quick way to check out each individual line is then to make small multiples. Here I wrap the panels, and make the plot size bigger. I also make the X and Y axis null. This is ok though, as I am just focusing on the shape of the trend, not the absolute level.

*Small multiples.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Year Total_Rate County 
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  PAGE: begin(scale(1000px,1000px))
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Year=col(source(s), name("Year"))
  DATA: Total_Rate=col(source(s), name("Total_Rate"))
  DATA: County=col(source(s), name("County"), unit.category())
  COORD: rect(dim(1,2), wrap())
  GUIDE: axis(dim(1), null())
  GUIDE: axis(dim(2), null())
  GUIDE: axis(dim(3), opposite())
  SCALE: linear(dim(1), min(1970), max(2014))
  ELEMENT: line(position(Year*Total_Rate*County))
  PAGE: end()
END GPL.
*Manually edited to make less space between panels.

There are a total of 62 counties in New York, so this is feasible. With panel sets of many more lines, you can either split the small multiple into more graphs, or cluster the lines based on the overall shape of the trend into different panels.

Here you can see that the outliers are New York county (Manhattan) and Bronx county. Bronx is a pretty straight upward trend (which mirrors many other counties), but Manhattan’s trajectory is pretty unique and has a higher variance than most other places in the state. Also you can see Sullivan county has quite a high rate compared to most other upstate counties (upstate is New York talk for everything not in New York City). But it leveled off fairly early in the time series.

This dataset also has arrest rates broken down by different categories; felony (drug, violent, dwi, other), and misdemeanor (drug, dwi, property, other). It is interesting to see that arrest rates have been increasing in most places over this long time period, even though crime rates have been going down since the 1990’s. They all appear to be pretty correlated, but let me know if you use this dataset to do some more digging. (It appears index crime totals can be found going back to 1990 here.)

Advertisements
Previous Post
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: