Digg reader is shutting down, giving Twitter a try

I’ve used RSS feeds for quite awhile now to keep up with blogs I enjoy. I also use it to follow scholarly journals of interest. Unfortunately, my current feed reader of choice (Digg Reader) is shutting down.

This is the second time my feed reader has shuttered (I used Google Reader before that shut down as well). Another particular problem I never really solved was link rot. Google reader has some metrics where you could see old feeds that have not had any new posts for awhile. Digg had no such service, and I tried my hand at writing python code to do this myself, but that code never quite worked out.

To partially replace this service instead of migrating to another feed reading service I will give Twitter a shot. Twitter is a bit chaotic from what I can tell — I much prefer the spreadsheet like listing of just a title to peruse news and events of interest in the morning. I had been using Google+ and like it (yes, I know I’m one of those nerds), but it is a bit of a ghost-town. So I will migrate entirely over to Twitter and give it a shot.


Paper published: The effect of housing demolitions on crime in Buffalo, New York

I have a new paper published with a few of my colleagues up in Buffalo, Dae-Young Kim and Scott Phillips. This work looks at the crime reduction effects of widespread demolitions in Buffalo, is titled The Effect of Housing Demolitions on Crime in Buffalo, New York, and was published at the Journal of Research in Crime & Delinquency. In short, at the micro level there is very strong evidence that demolitions reduce crime — the neighborhood level the evidence is not as strong. This is likely partly due to the neighborhood level analysis being underpowered, as several of the estimates between the two are very similar overall.

If you cannot get access to that published article, you can always send me an email for a copy, or you can download a pre-print version from SSRN.

Below is one of the images from the paper, a set of small-multiple maps showing demographic characteristics of Buffalo census tracts:

Someone could surely replicate this micro level result in other cities that have experienced widespread demolitions (like Detroit). But for long term city planners I would consider more rigorous designs that incorporate not only selective demolition, but other neighborhood investment strategies to improve neighborhoods over long term. That is, this research is good evidence of the near-term crime reduction effects of demolitions, but for the long haul leaving empty lots is not going to greatly improve neighborhoods.

New working paper: Modeling the Spatial Patterns of Intra-Day Crime Trends

I have a new working paper out with Cory Haberman, Modeling the Spatial Patterns of Intra-Day Crime Trends. Below is the abstract:

Several prior studies have found that despite theoretical expectations otherwise, facilities (such as on-premise alcohol outlets) have consistent effects on crime regardless of time of the day (Bernasco et al., 2017; Haberman & Ratcliffe, 2015). We explain these results by failure to account for the regular background wave of crime, which results from ubiquitous patterns of human routine activities. Using eight years of data on assaults and robberies in Seattle (WA), we demonstrate the regularity of the within-day crime wave for all areas of the city. Then using models to predict when a crime will most likely occur, we demonstrate how schools and on-premise alcohol outlets cause bumps in the background wave at particular times of the day, such as when school dismisses. But those bumps dissipate quite rapidly in space, and are relatively small compared to the amplitude of the regular background wave of crime. Although facilities have theoretical times in which they should have a greater influence on crime patterns, they are situated within a community of other human activity uses, making it difficult to uniquely identify their effects separately from other aspects of the built environment.

And here is a joyplot showing the changes in the hour of day wave depending on how close robberies are to a public high school or middle school:

You can see bumps very nearby schools at 7 am, then around noon and throughout the later afternoon, but are smoothed out when you get to around 2,000 feet away from schools.

The idea behind this paper is that several recent articles have not found much of a conditional relationship between crime generators and time of day. For example you would think bars only effect crime at nighttime when most people are at the bar, but several recent articles found the time of day does not make much of a difference (Bernasco et al., 2017; Haberman & Ratcliffe, 2015). We hypothesize this is because of the background wave of crime per hour of the day is much larger in magnitude than any local factor. An intuitive reason for this is that a place never has just a bar in isolation, there are other local land uses nearby that influence criminal patterns. You can see places nearby crime generators cause slight bumps in the background wave, but they are tiny compared to the overall amplitude of the general within day crime wave.

The article has a link to data and code to reproduce the findings. As always if you have feedback I am all ears.

Paper published: Evaluating Community Prosecution Code Enforcement in Dallas, Texas

Some work John Worrall and I collaborated on was just published in Justice Quarterly, Evaluating Community Prosecution Code Enforcement in Dallas, Texas. I have two links to share:

If you need access to the article always feel free to email.

Below is the abstract:

We evaluated a community prosecution program in Dallas, Texas. City attorneys, who in Dallas are the chief prosecutors for specified misdemeanors, were paired with code enforcement officers to improve property conditions in a number of proactive focus areas, or PFAs, throughout the city. We conducted a panel data analysis, focusing on the effects of PFA activity on crime in 19 PFAs over a six-year period (monthly observations from 2010 to 2015). Control areas with similar levels of pre-intervention crime were also included. Statistical analyses controlled for pre-existing crime trends, seasonality effects, and other law enforcement activities. With and without dosage data, the total crime rate decreased in PFA areas relative to control areas. City attorney/code enforcement teams, by seeking the voluntary or court-ordered abatement of code violations and criminal activity at residential and commercial properties, apparently improved public safety in targeted areas.

This was a neat program, as PFAs are near equivalents of hot spots that police focus on. So for the evaluation we drew control areas from Dallas PD’s Target Area Action Grid (TAAG) Areas:

New preprint: The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence

I have a new preprint out, The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence. This is work with Rob Worden and Jasmine Silver from our time at the Finn Institute. Below is the abstract:

We evaluate the Violent Offender Identification Directive (VOID) tool, a risk assessment instrument implemented within a police department to prospectively identify offenders likely to be involved with future gun violence. The tool uses a variety of static measures of prior criminal history that are readily available in police records management systems. The VOID tool is assessed for predictive accuracy by taking a historical sample and calculating scores for over 200,000 individuals known to the police at the end of 2012, and predicting 103 individuals involved with gun violence (either as a shooter or a victim) during 2013. Despite weights for the instrument being determined in an ad-hoc manner by crime analysts, the VOID tool does very well in predicting involvement with gun violence compared to an optimized logistic regression and generalized boosted models. We discuss theoretical reasons why such ad-hoc instruments are likely to perform well in identifying chronic offenders for all police departments.

There were just slightly over 100 violent gun offenders we were trying to pick out of over 200,000. The VOID tool did really well! Here is a graph comparing how many of those offenders VOID captured compared to a generalized boosted model (GBM), and two different logistic regression equations.

I have some of my thoughts in this article as to why a simple tool does just as well as more complicated regression and machine learning techniques, which is a common finding in recidivism studies as well. My elevator pitch for why that is is because most offenders are generalists, and for example you can basically swap prior arrests for robbery with prior arrests for motor vehicle theft — they both provide essentially the same signal for future potential criminality. See also discussion of this on Dan Simpson’s post on the Stat Modeling, Causal Inference and Social Science blog, which in turn makes me think the idea behind simple models can be readily applied to many decision points in the criminal justice field.

The simple takeaway from this for crime analysts making chronic offender lists is that don’t let the perfect be the enemy of the good. Analysts can likely create an ad-hoc weighting to prioritize chronic offenders and it will do quite well compared to fancier models.

I will be presenting this work at the ACJS conference in New Orleans on Saturday 2/17/18. It is a great session, with YongJei Lee, Jerry Ratcliffe, Bryanna Fox, and Stacy Sechrist (see session 384 in the ACJS program), so stop on by. If you want to catch up with me in New Orleans just send me an email. And as always if you have feedback on the draft I am all ears.

New preprint: Testing for Similarity in Area-Based Spatial Patterns: Alternative Methods to Andresen’s Spatial Point Pattern Test

I just posted another pre-print to SSRN, Testing for Similarity in Area-Based Spatial Patterns: Alternative Methods to Andresen’s Spatial Point Pattern Test. This is work with Wouter Steenbeek and Martin Andresen. Below is the abstract:

Andresen’s spatial point pattern test (SPPT) compares two spatial point patterns on defined areal units: it identifies areas where the spatial point patterns diverge and aggregates these local (dis)similarities to one global measure. We discuss the limitations of the SPPT and provide two alternative methods to calculate differences in the point patterns. In the first approach we use differences in proportions tests corrected for multiple comparisons. We show how the size of differences matter, as with large point patterns many areas will be identified by SPPT as statistically different, even if those differences are substantively trivial. The second approach uses multinomial logistic regression, which can be extended to identify differences in proportions over continuous time. We demonstrate these methods on identifying areas where pedestrian stops by the New York City Police Department are different from violent crimes from 2006 through 2016.

And here is an example map using our proportion differences test and graduated circles to identify places with larger differences in the percentages:

This is opposed to the traditional SPPT output, which just identifies whether two areas are different and does not focus on the size of the difference, like below:

You can see with a large sample size, basically everything is statistically different! (This uses over 4 million stops and over 800,000 violent crimes). Focusing on the magnitude of the differences gives a much clear indication of patterns.

The paper includes a dropbox link to download the data and code used to estimate the different techniques (it includes code in SPSS, R, and Stata). If you have any feedback as always let me know. This was submitted as a GISScience presentation for the 2018 ESRI User conference in July in San Diego, so I should have news about that presentation in the near future as well.

New preprint: A Gentle Introduction to Creating Optimal Patrol Areas

I have a new preprint posted, A Gentle Introduction to Creating Optimal Patrol Areas. Below is the abstract:

Models to create optimal patrol areas have been in existence for over 45 years, but police departments still regularly construct patrol areas in an ad-hoc fashion. This essay walks the reader through formulating an integer linear program to create a set number of patrol areas that have near equal call load and that are contiguous using simple examples. Then the technique is illustrated using a case study in Carrollton, TX. Creating optimal patrol areas not only have the potential to improve efficiency in response times, but can also encourage hot spots policing. Applications of linear programming can additionally be applied to a wide variety of problems within criminal justice agencies, and this essay provides a gentle introduction to understanding the mathematical notation of linear programming.

In this paper I introduce a very simple integer linear program to create patrol beats, and then build up the complexity into the fuller p-median problem with additional constraints applicable to making patrol areas. The constraints on making the call load equal that I introduce in the paper are the only real novel aspect of the paper (although no doubt someone else has done something similar previously), but I was a bit frustrated reading other linear programs to create patrol areas. Most work was concentrated in operations research journals and in my opinion was totally inaccessible to a typical crime analyst. So I frame the paper as an introduction to integer linear programs, walk though some simplified examples, and then apply that full model in Carrollton. I also provide an extensive walkthrough using the python program PuLP so others can replicate the work with their own data in the supplementary materials.

Here is my end example map of the optimal patrol areas in Carrollton.

You can see that my areas are not as nice and convex, although most applications of correcting for that introduce multiple objective functions and/or non-linear functions, making the problem much harder to estimate in practice (which was part of my pet-peeve with prior publications – none provided code to estimate the models described, with the exception of some of the work of Kevin Curtin).

Part of the reason I tackled this problem is that it comes up all the time on the IACA list-serve — how to make new patrol areas. If you are an analyst interested in applying this in your jurisdiction and would like help always feel free to contact me.

Remaking a clustered bar chart

Thomas Lumley on his blog had a recent example of remaking a clustered bar chart that I thought was a good idea. Here is a screenshot of the clustered bar chart (the original is here):

And here is Lumley’s remake:

In the original bar chart it is hard to know what is the current value (2017) and what are the past values. Also the bar chart goes to zero on the Y axis, which makes any changes seem quite small, since the values only range from 70% to 84%. Lumley’s remake clearly shows the change from 2016 to 2017, as well as the historical range from 2011 through 2016.

I like Lumley’s remake quite alot, so I made some code in SPSS syntax to show how to make a similar chart. The grammar of graphics I always thought is alittle confusing when using clustering, so this will be a good example demonstration. Instead of worrying about the legend I just added in text annotations to show what the particular elements were.

One additional remake is instead of offsetting the points and using a slope chart (this is an ok use, but see my general critique of slopegraphs here) is to use a simpler dotplot showing before and after.

One reason I do not like the slopes is that slope itself is dictated by the distance from 16 to 17 in the chart (which is arbitrary). If you squeeze them closer together the slope gets higher. The slope itself does not encode the data you want, you want to calculate the difference from beginning to end. But it is not a big difference here (my main complaints for slopegraphs are when you superimpose many different slopes that cross one another, in those cases I think a scatterplot is a better choice).

Jonathan Schwabish on his blog often has similar charts (see this one example).

Pretty much all clustered bar charts can be remade into either a dotplot or a line graph. I won’t go as far as saying you should always do this, but I think dot plots or line graphs would be a better choice than a clustered bar graph for most examples I have seen.

Here like Lumley said instead of showing the ranges likely a better chart would just be a line chart over time of the individual years, that would give a better since of both trends as well as typical year-to-year changes. But these alternatives to a clustered bar chart I do not think turned out too shabby.

SPSS Code to replicate the charts. I added in the labels for the elements manually.

*data from https://www.stuff.co.nz/national/education/100638126/how-hard-was-that-ncea-level-1-maths-exam.
*Motivation from Thomas Lumley, see https://www.statschat.org.nz/2018/01/18/better-or-worse/.

DATA LIST FREE / Type (A10) Low Y2017 Y2016 High (4F3.1).
Tables 78.1 71.2 80.5 84
Geo 71.5 73.5 72 75.6
Chance 74.7 78.4 80.2 80.2
Algebra 72.2 78.3 82 82
  'Tables' 'Tables, equations, and graphs'
  'Geo' 'Geometric Reasoning'
  'Chance' 'Chance and data'
  'Algebra' 'Algebraic procedures'
FORMATS Low Y2017 Y2016 High (F3.0).

*In this format I can make a dot plot.
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y2017 Y2016 Low High Type 
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y2017=col(source(s), name("Y2017"))
  DATA: Y2016=col(source(s), name("Y2016"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  GUIDE: axis(dim(1), delta(1), start(70))
  GUIDE: axis(dim(1), label("Percent Students with a grade of 'Achieved' or better"), opposite(), delta(100), start(60))
  GUIDE: axis(dim(2))
  SCALE: cat(dim(2), include("Algebra", "Chance", "Geo", "Tables"))
  ELEMENT: edge(position((Low+High)*Type), size(size."30"), color.interior(color.grey), 
  ELEMENT: edge(position((Y2016+Y2017)*Type), shape(shape.arrow), color(color.black), size(size."2"))
  ELEMENT: point(position(Y2016*Type), color.interior(color.black), shape(shape.square), size(size."10"))

*Now trying a clustered bar graph.
  /GRAPHDATASET NAME="graphdataset" VARIABLES=Y2017 Y2016 Low High Type 
  SOURCE: s=userSource(id("graphdataset"))
  DATA: Y2017=col(source(s), name("Y2017"))
  DATA: Y2016=col(source(s), name("Y2016"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  TRANS: Y17 = eval("2017")
  TRANS: Y16 = eval("2016")
  COORD: rect(dim(1,2), cluster(3,0))
  GUIDE: axis(dim(3))
  GUIDE: axis(dim(2), label("% Achieved"), delta(1), start(70))
  ELEMENT: edge(position(Y16*(Low+High)*Type), size(size."30"), color.interior(color.grey), 
  ELEMENT: edge(position((Y16*Y2016*Type)+(Y17*Y2017*Type)), shape(shape.arrow), color(color.black), size(size."2"))
  ELEMENT: point(position(Y16*Y2016*Type), color.interior(color.black), shape(shape.square), size(size."10"))

*This can get tedious if you need to make a line for many different years.
*Reshape to make a clustered chart in a less tedious way (but cannot use arrows this way).
COMPUTE Year = Year + 2015.
DO IF Year = 2017.

  SOURCE: s=userSource(id("graphdataset"))
  DATA: Type=col(source(s), name("Type"), unit.category())
  DATA: Perc=col(source(s), name("Perc"))
  DATA: Low=col(source(s), name("Low"))
  DATA: High=col(source(s), name("High"))
  DATA: Year=col(source(s), name("Year"), unit.category())
  COORD: rect(dim(1,2), cluster(3,0))
  GUIDE: axis(dim(3))
  GUIDE: axis(dim(2), label("% Achieved"), delta(1), start(70))
  SCALE: cat(dim(3), include("Algebra", "Chance", "Geo", "Tables"))
  ELEMENT: edge(position(Year*(Low+High)*Type), color.interior(color.grey), size(size."20"), transparency.interior(transparency."0.5"))
  ELEMENT: path(position(Year*Perc*Type), split(Type))  
  ELEMENT: point(position(Year*Perc*Type), size(size."8"), color.interior(color.black), color.exterior(color.white))


Come work with me! And advice on applying to grad school

Applications to apply for the PhD program in Criminology & Criminal Justice at UT Dallas are due soon, January 15th is the deadline. Are you interested in crime mapping, crime analysis, evaluating policing interventions intended to reduce crime? Well then apply for a PhD at UTD and come work with me!

For general advice, the PhD program is more like a mentor/mentee relationship than your prior schooling. The expectation is that you will become an independent scholar, and the best way to do that is to actually be involved with research projects of your advisor and subsequently conduct independent research for your dissertation.

So wherever you apply, you should have a good idea about what topics you want to pursue and who on the faculty you want to work with. And that focus should be reflected in your cover letter. That working relationship with a professor should be a stronger deciding factor than the rank of the program — if no one on the faculty conducts research in the area you want to pursue you will have a hard slog at pursuing that topical area for your dissertation.

If you are interested in working with a specific professor, feel free to send them an email about your interests and if they have time to advise another student. Not everyone does, and if you do not get response by email I would take that as a red flag that professor may not have enough time to effectively mentor you. See this scatterplot blog post for advice when emailing potential advisers.



My Year Blogging in Review – 2017

So the blog has continued to show linear growth in terms of views over time, I take a good hit though in December.

I only ended up writing 35 new posts in 2017 (that includes things that are not blog posts, like pages I created for new classes). For comparison in 2015 I wrote 50, and 2016 I wrote 40. I’ve managed to be pretty consistent though over time, here is the cumulative total over time.

That is more or less what I aim for, to just have some content every few weeks.

There is not much to say in terms of popular posts on the site for the year. My most popular posts are ones I’ve written in previous years. I did not have any post this year gain a large number of viewers when it was first written. It is just a slow accumulation of around 200 views per day, mostly people being referred via Google searches.

I wanted to analyze the topics I’ve written about over time, so I grabbed all of the tags I’ve placed on posts. I collapsed both categories and tags, as I don’t really make much of a distinction when I pick them. Here is a graph of the number of posts that have that tag and the page views (this will double count page views, for example a post could have both SPSS and Data Visualization). None refers to pages that are not blog posts, like my home page and pages I created for class syllabi.

If we look at the ratio though, you can see my scholarly posts are mostly ignored, only in total do they accumulate much viewing.

My posts on showing how to use various google maps services with python must be reasonably high in Google searches, as I get a slow trickle of hits for them every day. The high uncertainty is driven by my ratios need to be plotted on log scales post.

I tried to analyze whether the content of my posts substantively has changed over time. I suspected since I took my job at Dallas my posts swayed more towards paper/scholarly (the tag I use for academic related things) and more away from technical computing stuff. I have too few of posts though (and too many tags) to easily make sense of it. Taking only tags that are included on 30 or more posts, here are the counts of those tags over time.

About the only clear trend is that scholarly has risen with SPSS dropping, the other frequent categories though look to me to be fairly consistent. I could spend more time grouping the tags into thematic content, but I have too many other things I need to do (including writing other blog posts)!

Happy New Year!