Paper: The Effect of 311 Calls for Service on Crime in D.C. at Microplaces published

My paper, The Effect of 311 Calls for Service on Crime in D.C. at Microplaces, was published online first at Crime & Delinquency. Here is the link to the published paper. If you do not have access to a library where you can get the paper always feel free to email and I will send an off-print. But I also have the pre-print posted on SSRN. Often the only difference between my pre-prints and the finished version is the published paper is shorter!

As a note, I’ve also posted all of the data and code to replicate my findings. The note is unfortunately buried at the end of the paper, instead of the beginning.

This was the first paper published from my dissertation. I have pre-prints out for two others, What we can learn from small units and Local and Spatial Effect of Bars. Hopefully you will see those two in print the near future as well!

Testing the equality of coefficients – Same Independent, Different Dependent variables

As promised earlier, here is one example of testing coefficient equalities in SPSS, Stata, and R.

Here we have different dependent variables, but the same independent variables. This is taken from Dallas survey data (original data link, survey instrument link), and they asked about fear of crime, and split up the questions between fear of property victimization and violent victimization. Here I want to see if the effect of income is the same between the two. People in more poverty tend to be at higher risk of victimization, but you may also expect people with fewer items to steal to be less worried. Here I also control for the race and the age of the respondent.

The dataset has missing data, so I illustrate how to select out for complete case analysis, then I estimate the model. The fear of crime variables are coded as Likert items with a scale of 1-5, (higher values are more safe) but I predict them using linear regression (see the Stata code at the end though for combining ordinal logistic equations using suest). Race is of course nominal, and income and age are binned as well, but I treat the income bins as a linear effect. I pasted the codebook for all of the items at the end of the post.

These models with multiple dependent variables have different names, economists call them seemingly unrelated regression, psychologists will often just call them multivariate models, those familiar with structural equation modeling can get the same results by allowing residual covariances between the two outcomes — they will all result in the same coefficient estimates in the end.

SPSS

In SPSS we can use the GLM procedure to estimate the model. Simultaneously we can specify particular contrasts to test whether the income coefficient is different for the two outcomes.

*Grab the online data.
SPSSINC GETURI DATA URI="https://dl.dropbox.com/s/r98nnidl5rnq5ni/MissingData_DallasSurv16.sav?dl=0" FILETYPE=SAV DATASET=MissData.

*Conducting complete case analysis.
COUNT MisComplete = Safety_Violent Safety_Prop Gender Race Income Age (MISSING).
COMPUTE CompleteCase = (MisComplete = 0).
FILTER BY CompleteCase.


*This treats the different income categories as a continuous variable.
*Can use GLM to estimate seemingly unrelated regression in SPSS and test.
*equality of the two coefficients.
GLM Safety_Violent Safety_Prop BY Race Age WITH Income
  /DESIGN=Income Race Age
  /PRINT PARAMETER
  /LMATRIX Income 1
  /MMATRIX ALL 1 -1.

FILTER OFF.  

In the output you can see the coefficient estimates for the two equations. The income effect for violent crime is 0.168 (0.023) and for property crime is 0.114 (0.022).

And then you get a separate table for the contrast estimates.

You can see that the contrast estimate, 0.054, equals 0.168 – 0.114. The standard error in this output (0.016) takes into account the covariance between the two estimates. Here you would reject the null that the effects are equal across the two equations, and the effect of income is larger for violent crime. Because higher values on these Likert scales mean a person feels more safe, this is evidence that those with higher incomes are more likely to be fearful of property victimization, controlling for age and race.

Unfortunately the different matrix contrasts are not available in all the different types of regression models in SPSS. You may ask whether you can fit two separate regressions and do this same test. The answer is you can, but that makes assumptions about how the two models are independent — it is typically more efficient to estimate them at once, and here it allows you to have the software handle the Wald test instead of constructing it yourself.

R

As I stated previously, seemingly unrelated regression is another name for these multivariate models. So we can use the R libraries systemfit to estimate our seemingly unrelated regression model, and then use the library multcomp to test the coefficient contrast. This does not result in the exact same coefficients as SPSS, but devilishly close. You can download the csv file of the data here.

library(systemfit) #for seemingly unrelated regression
library(multcomp)  #for hypothesis tests of models coefficients

#read in CSV file
SurvData <- read.csv(file="MissingData_DallasSurvey.csv",header=TRUE)
names(SurvData)[1] <- "Safety_Violent" #name gets messed up because of BOM

#Need to recode the missing values in R, use NA
NineMis <- c("Safety_Violent","Safety_Prop","Race","Income","Age")
#summary(SurvData[,NineMis])
for (i in NineMis){
  SurvData[SurvData[,i]==9,i] <- NA
}

#Making a complete case dataset
SurvComplete <- SurvData[complete.cases(SurvData),NineMis]
#Now changing race and age to factor variables, keep income as linear
SurvComplete$Race <- factor(SurvComplete$Race, levels=c(1,2,3,4), labels=c("Black","White","Hispanic","Other"))
SurvComplete$Age <- factor(SurvComplete$Age, levels=1:5, labels=c("18-24","25-34","35-44","45-54","55+"))
summary(SurvComplete)

#now fitting seemingly unrelated regression
viol <- Safety_Violent ~ Income + Race + Age
prop <- Safety_Prop ~ Income + Race + Age
fitsur <- systemfit(list(violreg = viol, propreg= prop), data=SurvComplete, method="SUR")
summary(fitsur)

#testing whether income effect is equivalent for both models
viol_more_prop <- glht(fitsur,linfct = c("violreg_Income - propreg_Income = 0"))
summary(viol_more_prop) 

Here is a screenshot of the results then:

This is also the same as estimating a structural equation model in which the residuals for the two regressions are allowed to covary. We can do that in R with the lavaan library.

library(lavaan)

#for this need to convert factors into dummy variables for lavaan
DumVars <- data.frame(model.matrix(~Race+Age-1,data=SurvComplete))
names(DumVars) <- c("Black","White","Hispanic","Other","Age2","Age3","Age4","Age5")

SurvComplete <- cbind(SurvComplete,DumVars)

model <- '
    #regressions
     Safety_Prop    ~ Income + Black + Hispanic + Other + Age2 + Age3 + Age4 + Age5
     Safety_Violent ~ Income + Black + Hispanic + Other + Age2 + Age3 + Age4 + Age5
    #residual covariances
     Safety_Violent ~~ Safety_Prop
     Safety_Violent ~~ Safety_Violent
     Safety_Prop ~~ Safety_Prop
'

fit <- sem(model, data=SurvComplete)
summary(fit)

I’m not sure offhand though if there is an easy way to test the coefficient differences with a lavaan object, but we can do it manually by grabbing the variance and the covariances. You can then see that the differences and the standard errors are equal to the prior output provided by the glht function in multcomp.

#Grab the coefficients I want, and test the difference
PCov <- inspect(fit,what="vcov")
PEst <- inspect(fit,what="list")
Diff <- PEst[9,'est'] - PEst[1,'est']
SE <- sqrt( PEst[1,'se']^2 + PEst[9,'se']^2 - 2*PCov[9,1] )
Diff;SE

Stata

In Stata we can replicate the same prior analyses. Here is some code to simply replicate the prior results, using Stata’s postestimation commands (additional examples using postestimation commands here). Again you can download the csv file used here. The results here are exactly the same as the R results.

*Load in the csv file
import delimited MissingData_DallasSurvey.csv, clear

*BOM problem again
rename ïsafety_violent safety_violent

*we need to specify the missing data fields.
*for Stata, set missing data to ".", not the named missing value types.
foreach i of varlist safety_violent-ownhome {
    tab `i'
}

*dont specify district
mvdecode safety_violent-race income-age ownhome, mv(9=.)
mvdecode yearsdallas, mv(999=.)

*making a variable to identify the number of missing observations
egen miscomplete = rmiss(safety_violent safety_prop race income age)
tab miscomplete
*even though any individual question is small, in total it is around 20% of the cases

*lets conduct a complete case analysis
preserve 
keep if miscomplete==0

*Now can estimate multivariate regression, same as GLM in SPSS
mvreg safety_violent safety_prop = income i.race i.age

*test income coefficient is equal across the two equations
lincom _b[safety_violent:income] - _b[safety_prop:income]

*same results as seemingly unrelated regression
sureg (safety_violent income i.race i.age)(safety_prop income i.race i.age)

*To use lincom it is the exact same code as with mvreg
lincom _b[safety_violent:income] - _b[safety_prop:income]

*with sem model
tabulate race, generate(r)
tabulate age, generate(a)
sem (safety_violent <- income r2 r3 r4 a2 a3 a4 a5)(safety_prop <- income r2 r3 r4 a2 a3 a4 a5), cov(e.safety_violent*e.safety_prop) 

*can use the same as mvreg and sureg
lincom _b[safety_violent:income] - _b[safety_prop:income]

You will notice here it is the exact some post-estimation lincom command to test the coefficient equality across all three models. (Stata makes this the easiest of the three programs IMO.)

Stata also allows us to estimate seemingly unrelated regressions combining different generalized outcomes. Here I treat the outcome as ordinal, and then combine the models using seemingly unrelated regression.

*Combining generalized linear models with suest
ologit safety_violent income i.race i.age
est store viol

ologit safety_prop income i.race i.age
est store prop

suest viol prop

*lincom again!
lincom _b[viol_safety_violent:income] - _b[prop_safety_prop:income]

An application in spatial criminology is when you are estimating the effect of something on different crime types. If you are predicting the number of crimes in a spatial area, you might separate Poisson regression models for assaults and robberies — this is one way to estimate the models jointly. Cory Haberman and Jerry Ratcliffe have an application of this as well estimate the effect of different crime types at different times of day – e.g. the effect of bars in the afternoon versus the effect of bars at nighttime.

Codebook

Here is the codebook for each of the variables in the database.

Safety_Violent  
    1   Very Unsafe
    2   Unsafe
    3   Neither Safe or Unsafe
    4   Safe
    5   Very Safe
    9   Do not know or Missing
Safety_Prop 
    1   Very Unsafe
    2   Unsafe
    3   Neither Safe or Unsafe
    4   Safe
    5   Very Safe
    9   Do not know or Missing
Gender  
    1   Male
    2   Female
    9   Missing
Race    
    1   Black
    2   White
    3   Hispanic
    4   Other
    9   Missing
Income  
    1   Less than 25k
    2   25k to 50k
    3   50k to 75k
    4   75k to 100k
    5   over 100k
    9   Missing
Edu 
    1   Less than High School
    2   High School
    3   Some above High School
    9   Missing
Age 
    1   18-24
    2   25-34
    3   35-44
    4   45-54
    5   55+
    9   Missing
OwnHome 
    1   Own
    2   Rent
    9   Missing
YearsDallas
    999 Missing

New working paper – Monitoring volatile homicide trends across U.S. cities

I have a new working paper out — Monitoring volatile homicide trends across U.S. cities, with one of my colleagues Tomislav Kovandzic. You can grab the pre-print on SSRN, and the paper has links to code to replicate the charts and models in the paper.

Here I look at homicide rates in U.S. cities and use funnel charts and fan charts to show the typical volatility in homicide rates between cities and within cities over time. As I’ve written previously, I think much of the media narrative around homicide increases are hyperbolic and often cherry pick reasons why they think homicides are going up.

I’ve shown examples of funnel charts on this blog before, so I will use a different image as the tease. To generate the prediction intervals for fan charts I estimate binomial random effect models. Below is an example for New Orleans (homicide rate per 100,000 population):

As always, if you have feedback feel free to send me an email.

Communities and Crime

This was my first semester teaching undergrads at UT Dallas. I taught the Communities and Crime undergrad course. I thought it went very well, and I was impressed with the undergrads here. For the course I had students do a bunch of different prediction assignments based on open data in Dallas, such as predicting what neighborhood has the most crime, or which specific bar has the most assaults. The idea being they would use the theories I discussed in the prior lecture to make the best predictions.

For their final assignment, I had students predict an arbitrary area to capture the most robberies in 2016 (up to that point they had only been predicting crimes in 2015). I used the same metric that NIJ is using in their crime forecasting challenge – the predictive accuracy index. This is simply % crime/% area, so students who give larger areas are more penalized. This ended up producing a pretty neat capstone to the end of the semester.

Below is a screen shot of the map, and here is a link to an interactive version. (WordPress.com sites only allow specific types of iframe sources, so my dropbox src link to the interactive Leaflet map gets stripped.)

Look forward to teaching this class again (as of now it seems I will regularly offer it every spring).

More news on classes to come soon. I am teaching GIS applications in Criminology online over the summer. For a quick idea about the content, it will be almost the same as the GIS course in criminal justice I previously taught at SUNY.

In short, if you think maps rock then you should take my classes 😉

SPSS Statistics for Data Analysis and Visualization – book chapter on Geospatial Analytics

A book I made contributions to, SPSS Statistics for Data Analysis and Visualization, is currently out. Keith and Jesus are the main authors of the book, but I contributed one chapter and Jon Peck contributed a few.

The book is a guided tour through many of the advanced statistical procedures and data visualizations in SPSS. Jon also contributed a few chapters towards using syntax, python, and using extension commands. It is a very friendly walkthrough, and we have all contributed data files for you to be able to follow along through the chapters.

So there is alot of content, but I wanted to give a more specific details on my chapter, as I think they will be of greater interest to crime analysts and criminologists. I provide two case studies, one of using geospatial association rules to identify areas of high crime plus high 311 disorder complaints in DC (using data from my dissertation). The second I give an example of spatio-temporal forecasting of ShotSpotter data at the weekly level in DC using both prior shootings as well as other prior Part 1 crimes.

Geospatial Association Rules

The geospatial association rules is a technique for high dimensional contingency tables to find particular combinations among categories that are more prevalent. I show examples of finding that thefts from motor vehicles tend to be associated in places nearby graffiti incidents.

And that assaults tend to be around locations with more garbage complaints (and as you can see each has a very different spatial patterning).

I consider this to be a useful exploratory data analysis type technique. It is very similar in application to conjunctive analysis, that has prior very similar crime mapping applications in risk terrain modeling (see Caplan et al., 2017).

Spatio-Temporal Prediction

The second example case study is forecasting weekly shootings in fairly small areas (500 meter grid cells) using ShotSpotter data in DC. I also use the prior weeks reported Part 1 crime types (Assault, Burglary, Robbery, etc.), so it is similar to the leading indicators forecasting model advocated by Wilpen Gorr and colleagues. I show that prior shootings predict future shootings up to 5 lags prior (so over a month), and that the prior crimes do have an effect on future shootings (e.g. robberies in the prior week contribute to more shootings in the subsequent week).

If you have questions about the analyses, or are a crime analyst and want to apply similar techniques to your data always feel free to send me an email.

My solution for grade inflation

It is the end of the semester and grades are upon us! Continual grade inflation in higher education is a well known problem. I don’t help any — and it is relatively easy to tell you why. There are zero incentives for me to grade harshly, as giving harsh grades is the best way to get more critical student appraisals. I probably earned myself a few more critical comments in just the past few days when giving students feedback on their end of semester final papers.

Now, don’t take this as I trivially give out grades. In my courses I have come to the style of having students do many different homeworks over the semester, instead of one big project or final exam that counts towards the majority of their grade. This helps more mediocre students, as they have more opportunities to make mistakes but still get a decent grade for the course. I think in terms of pedagogy this is better than cramming for a final or pouring everything into a paper written in haste, but I have no empirical evidence to back that up.

Before giving my solution though to grade inflation, we need to step back and say what is the point of grades? For individuals externally viewing someone’s grades, they accomplish two things:

  • provide an indication of competency in some topical area, e.g. Billy can drive a car because he passed his drivers exam.
  • provide a signal to prospective employers as to the relative merits of two students, e.g. Angela is a better candidate than Billy, because Angela’s GPA is 3.7 and Billy’s is 2.9.

In terms of helping students learn, the grade itself does not help them learn, but getting critical feedback does. E.g. me telling you got a B on your final doesn’t help you learn anymore, but me telling you specifically what answers you got wrong and right does. So I only consider grades here as necessary for external use by others to judge students.

My solution to grade inflation is simple and accomplishes both of my bullet points. We should give each student a pass/fail, and then we should give each student a relative, within class ranking. Specifically, on a students transcript they should have a number that says 1/30 if they were the top student out of 30, or 15/30 if they were the 15th ranked student out of 30, etc. for each course that they took.

Pass/Fail is for the ultimate competency point. Grade inflation currently makes letter grades and GPA essentially meaningless, everyone who passes has a high grade. Minimum GPA requirements for certain degrees effectively enforce this anyway. Most schools currently have things to try to make students stand out, Honors students, Deans list, cum laude or whatever. But those are subject to the same grade inflation problems, as they use grades to meet the cut-offs. Our system is essentially pass/fail already.

The relative ranking though is a bit more novel, but also accomplishes the signal to employers part about the relative merits of two students, at least those who take the same courses. It does so in a dimensionless way though, unlike GPA or letter grades. Grade inflation currently hurts the really good students the most, as the top part of the distribution is censored by having an upper limit of an A. Assigning a relative ranking for each course allows those students to come to the top though. Even if the entire class passes, there will still be students who rank in the top part and the bottom part of the class. (It also has the added benefit of mostly eliminating grade complaining by students – I have no control of your relative ranking.)

Both of these are easily accomplished with the way courses are currently structured. Professors need not change anything essentially. There would be some specific details to work out for relative ranking (ties, and combining rankings for different sized classes for the penultimate ranking equivalent of GPA) but those aren’t insurmountable. Pass/Fail is already a part of the system, so that obviously takes no additional work.

Currently getting a relative ranking for an individual class already provides much more information than letter grades do. It has some of the same flaws as letter grades, comparisons across schools or degrees or time are much harder to make, but it is no worse than letter grades in this regard. One critique could be that if you have a good cohort you will be lower in relative rankings, but that is a good thing when considering the signal perspective from an employer, as you should be judged against your peers on the job market, not against different cohorts.

There are similar programs in place, such as those schools publishing entire grade distributions (UNC was going to do this, but I’m not sure if it ever materialized). One of my professors (who received his degree not in the US) said his institution had real curved grades, e.g. the top 30% in the course got an A, the next 30% a B, etc. This works on the same principal as my relative rankings, but you have an ultimate judgment of pass/fail, instead of having the letter grade determine the pass/fail competency. Also only having a limited set of letter grades hurts the really good students. These tend to not be popular though based on the argument that all of the students could be good. The second pass/fail separates the two goals of grades, so makes this point moot. The complaint about different professors having different grading thresholds is still a problem for the ultimate pass/fail, but is entirely eliminated with relative rankings.

I can’t be the first one to think of this — let me know in the comments if some institution is already doing this! The scatterplot blog posts by Andrew Perrin suggest that UNC tried to do something like this with an Achievement Index, but that was still based on grades and seems much more complicated than what I am suggesting offhand.

Paper on Roadblocks in Buffalo published

My paper with Scott Phillips, A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY, has just been published online first in the Security Journal. Springer gifts me a special link in which you can read the paper. Previously when I have been given links like that from the publisher they have a time limit, but the email for this one said nothing. But even if that goes bad you can always read my pre-print of the article I posted on SSRN.


Title: A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY

Abstract:

This article evaluates the effective of a hot spots policing strategy: using automated license plate readers at roadblocks in Buffalo, NY. Different roadblock locations were chosen by the Buffalo Police Department every day over a two-month period. We use propensity score matching to identify a set of control locations based on prior counts of crime and demographic factors. We find modest reductions in Part 1 violent crimes (10 over all roadblock locations and over the two months) using t tests of mean differences. We find a 20% reduction in traffic accidents using fixed effects negative binomial regression models. Both results are sensitive to the model used though, and the fixed effects models predict increases in crimes due to the intervention. We suggest that the limited intervention at one time may be less effective than focusing on a single location multiple times over an extended period.

And here is Figure 2 from the paper, showing the units of analysis (street midpoints and intersections) and how the treatment locations were assigned.

Much ado about nothing: Overinterpreting volatility in homicide rates

I’m not much of a macro criminologist, but being asked questions by my dad (about Richard Rosenfeld and the Ferguson effect) and the dentist yesterday (asking about some of Trumps comments about rising crime trends) has prompted me to jump into it and give my opinion. Long story short — many sources I believe are overinterpreting short term fluctuations as more meaningful than they are.

First I will tackle national crime rates. So if you have happened to walk by a TV playing CNN the past few days, you may have heard Donald Trump being criticized for his statements on crime rates. This is partially a conflation with the difference between overall levels of crime versus changes in crime over time. Basically crime is currently low compared to historical patterns, but homicide rates have been rising in the past two years. This is easier to show in a chart than to explain in words. So here is the national estimated homicide rate per 100,000 individuals since 1960.1

2016 is not official and is still an estimate, but basically the pattern is this – crime has been falling generally across the country since the early 1990’s. Crime rates in just the past few years have finally dropped below levels in the 1960’s, but for the past two years homicides have been increasing. So some have pointed to the increase in the past two years and have claimed the sky is falling. To say this they say the rate of change is the largest in past 40 years. There are better charts to show rates of change (a semi-log chart), but the overall look is basically the same.

You have to really squint to see that change from 2014 to 2015 is a larger jump than any of the changes over the entire period, so arguments based on the size of recent changes in the homicide rate are hyperbole (either on a linear scale or a logarithmic scale). And even if you take the recent increases over the past two years as evidence of a more general rising trend, for a broader term pattern we still have homicide rates close to a low point in the past 50 years.

For a bit of general advice — any source that gives you a percent change you always want to see the base numbers and any longer term historical trends. Any media source that cites recent increases in homicides without providing this graph of long term historical crime trends is simply misleading. I’ve seen this done in many places, see this example from the New York Times or this recent note from the Economist. So this isn’t something specific to the President.

Now, macro criminologists don’t really have any better track record explaining these patterns than macro economists have in explaining economic trends. Basically we have a bunch of patch work theories that make sense for parts of the trend, but not the entire time frame. Changes in routine activities in 1960’s, increases in incarceration, the decline of crack use, ease of calling 911 with cell-phones, lead use, abortion (just to name a few). And academics come up with new theories all the time, the most recent being the Ferguson effect — which is simply another term for de-policing.

Now a bit on trends for specific cities. How this ties in with the national trend is that some articles have been pointing out that some cities have seen increases and some have not. That is fine to point out (albeit trivial), but then the articles frequently go on generate stories about why crime is rising in those specific places. Those on the left cite civil unrest and police brutality as possible reasons (Milwaukee, St. Louis, Chicago, Baltimore), while those on the right cite the deleterious effects of police departments not being as proactive (stops in Chicago, arrests in Baltimore).

While any of these explanations may turn out reasonable in the end, I’m pretty sure most of these articles severely underappreciate the volatility in homicide rates. Take an example with St. Louis, with a city population of just over 300,000. A homicide rate of 50 individuals per 100,000 means a total of 150 murders. A homicide rate of 40 per 100,000 means 120 murders. So we are only talking about a change of 30 murders overall. Fluctuations of around 10 in the murder rate would not be unexpected for a city with a population of 300,000 individuals. The confidence interval for a rate of 150 murders per 300,000 individuals is 126 to 176 murders.2

Even that though understates the typical volatility in homicide rates. As basically that assumes the proportion does not change over time. In reality crime statistics are more bursty, and show wilder fluctuations in different places.3 To show this for many cities, I use the data from the Economist article mentioned earlier, and create a motion chart of the changes in homicide rates over time. The idea behind this chart is a funnel chart. Cities with lower populations will show higher variance, and subsequently those dots on the left hand side of the chart will jump around alot more. The population figures are current and not varying, so the dots just move up and down on the Y axis.

For best viewing, make the X axis on the log scale, and size the points according to the population of the city. If you are at a desktop computer, you can open up a bigger version of the chart here.

Selecting individual points and then letting the animation run though illustrates the typical variability of crime over time. Here is the trace of St. Louis over the 36 year period.

New Orleans is another good example, we have fluctuations from under 30 to over 90 in the time period.

And here is Chicago, which shows less fluctuation than the smaller cities (as expected) but still has a range of homicide rates around 20 over the time period.

Howard Wainer has previously pointed this relationship out, and called it The Most Dangerous Equation. Basically, if you look you will be able to find some upward crime trends, especially in smaller cities. You need to look at it in the long term though and understand typical fluctuations to make a reasonable decision as to whether crime is increasing or if it is just typical year to year variation. The majority of news articles on the topic and just chock full of post hoc ergo propter hoc for particular cherry picked cites, and they often don’t make sense in explaining crime patterns over the past decade in those particular cities, let alone make sense for different cities experience similar conditions but not having rising homicide rates.



  1. For my notes about data sources, generally the data have come from the FBI UCR data tool (for the 1960 through 2014 data). 2015 data have come from the FBI web page for the 2015 UCR report. The 2016 projections come from this Economist article as well as the 50 cities data for the google motion chart.
  2. Calculated in R via (binom.test(150,300000)$conf.int[1:2])*300000. This is the exact Clopper-Pearson confidence interval.
  3. So even though this 538 article does a better job of acknowledging volatility, whatever test they use to determine statistically significant increases is likely to have too many false positives.

New undergrad course – Communities and Crime

This semester I am teaching a new undergrad course, communities and crime. Still a few seats left if you are a UT Dallas student and still interested. (You can also audit the course as well even if you are not a UT Dallas student.)

You can see the syllabus from the linked page, but compared to other syllabi I’ve found floating around, (see Dan O’Brien or Elizabeth Groff for two undergrad examples) I focus more on micro places than others. Some syllabi I’ve found spend basically the whole semester on social disorganization, which I think is excessive.

One experiment I am going to try for this course is to use Dallas Open crime data, and then have the students make predictions. For example, for their first assignment they are supposed to make their prediction based on social disorganization theory what neighborhood has the most crime in Dallas from this neighborhood map in Dallas. (Fusion table embedding not working in my WordPress post at the moment for some reason!)

These neighborhoods were obtained from Jane Massey, a researcher for the Dallas area Habitat for Humanity. Hence why the flood plain is its own neighborhood. It is the most reasonable source I’ve seen so far. Most generally agree (see Dallas Magazine for one example), but that data is not very tidy. See this web app to draw your own neighborhood in Dallas as well. And of course for students interested part of the discussion will be about how you define a neighborhood.

Blogging in Review – 2016

The site has continued to grow in 2016. Looking back over the prior years it has looked pretty linear the whole time.

I take a hit in December, but I almost managed on average 200 site views per day in November. I topped the 100,000 cumulative site views for the entire blogs existence in November of this year.

Despite moving from Albany to Texas, I still managed to publish 40 new pages this year, which I am pretty happy with. I don’t set myself with any hard expectations, but I like to publish something at least once every two to four weeks.

While some of my initial traffic is bursty, e.g. gets shared on a popular site and you get a couple hundred views in a day, most of my traffic is a slow trickle of referrals from google. Here is a plot of my pages by average views per day, broken down by some of my main categories. Posts colored in red have an SPSS tag, and so the Python and R columns can also be posts on SPSS. (So most of my python posts are calling python from SPSS.)

So even my most popular posts do not average more than a few views per day, and most do not get any appreciable traffic at all. Here are the labels in that dot plot to show what posts they are.

Don’t ask me why some end up being more popular than others (who knew Venn diagrams in R?). I wrote a few more blog posts on using various google maps APIs with python in response to the google places post being popular. The google street view post is doing pretty well, the others not so much though.

My motivation for posts though are more in line with an academic journal/notebook/diary – I post on some project I am working on essentially, I don’t go and research specific topics just for the blog. I am happy with the extra exposure though – and I’m sure there is more value added to a tutorial blog post than there is for a stuffy academic paper that is read by two dozen individuals (even if that is what counts towards my tenure)!