New working paper: The effect of housing demolitions on crime in Buffalo, New York

I have a new working paper up, The effect of housing demolitions on crime in Buffalo, New York. This is in conjunction with my colleagues Dae-Young Kim and Scott Phillips, who are at SUNY Buffalo. Below is the abstract.

Objectives: From 2010 through 2015, the city of Buffalo demolished over 2,000 residences. This study examines whether those demolitions resulted in crime reductions.

Methods: Analysis was conducted at micro places matching demolished parcels to comparable control parcels with similar levels of crime. In addition, spatial panel regression models were estimated at the census tract and quarterly level, taking into account demographic characteristics of neighborhoods.

Results: We find that at the micro place level, demolitions cause a steep drop in reported crime at the exact parcel, and result in additional crime decreases at buffers of up to 1,000 feet away. At the census tract level, results indicated that demolitions reduced Part 1 crimes, but the effect was not statistically significant across different models.

Conclusions: While concerns over crime and disorder are common for vacant houses, the evidence that housing demolitions are an effective crime reduction solution is only partially supported by the analyses here. Future research should compare demolitions in reference to other neighborhood revitalization processes.

As always, if you have feedback/comments let me know.

And here are a few maps from the paper!

Geocoding with census data and the Census API

For my online GIS class I have a tutorial on creating an address locator using street centerline data in ArcGIS. Eventually I would like to put all of my class online, but for now I am just sharing that one, as I’ve forwarded it alot recently.

That tutorial used local street centerline data in Dallas that you can download from Dallas’s open data site. It also gives directions on how to use an online ESRI geocoding service — which Dallas has. But what if those are not an option? A student recently wanted to geocode data from San Antonio, and the only street data file they publicly provide lacks the beginning and ending street number.

That data is insufficient to create an address locator. It is also the case that the road data you can download from the census’s web interface lacks this data. But you can download street centerline data with beginning and end addresses from the census from the FTP site. For example here is the url that contains the streets with the address features. To use that you just have to figure out what state and county you are interested in downloaded. The census even has ESRI address locators already made for you using 2012 data at the state level. Again you just need to figure out your states number and download it.

Once you download the data with the begin and ending street numbers you can follow along with that tutorial the same as the public data.

Previously I’ve written about using the Google geocoding API. If you just have crime data from one jurisdiction, it is simple to make a geocoder for just that locality. But if you have data for many cities (say if you were geocoding home addresses) this can be more difficult. An alternative online API to google that does not have daily limits is the Census Geocoding API.

Here is a simple example in R of calling the census API and geocoding a list of addresses.


get_CensusAdd <- function(street,city,state,zip,benchmark=4){
    base <- ""
    soup <- GET(url=base,query=list(street=street,city=city,state=state,zip=zip,format='json',benchmark=benchmark))
    dat <- fromJSON(content(soup,as='text'), simplifyVector=TRUE)
    D_dat <- dat$result$addressMatches
    if (length(D_dat) > 1){
    return(c(D_dat['matchedAddress'],D_dat['coordinates'][[1]])) #error will just return null, x[1] is lon, x[2] is lat
    else {return(c('',NA,NA))}

#now create function to loop over data frame and return set of addresses
geo_CensusTIGER <- function(street,city,state,zip,sleep=1,benchmark=4){
  #make empy matrix
  l <- length(street)
  MyDat <- data.frame(matrix(nrow=l,ncol=3))
  names(MyDat) <- c("MatchedAdd","Lon","Lat")
  for (i in 1:l){
    x <- suppressMessages(get_CensusAdd(street=street[i],city=city[i],state=state[i],zip=zip[i],benchmark=benchmark))
    if (length(x) > 0){
        MyDat[i,1] <- x[1]
        MyDat[i,2] <- x[2]
        MyDat[i,3] <- x[3]
  MyDat$street <- street
  MyDat$city <- city
  MyDat$zip <- zip
  MyDat$state <- state

## Arbitrary dataframe for an exercise
AddList <- data.frame(
  IdNum = c(1,2,3,4,5),
  Address = c("450 W Harwood Rd", "2878 Fake St", "2775 N Collin St", "2775 N Collins St", "Lakewood Blvd and W Shore Dr"),
  City = c("Hurst", "Richardson", "Arlington", "Arlington", "Dallas"),
  State = c("TX", "TX", "TX", "TX", "TX")

test <- geo_CensusTIGER(street=AddList$Address,city=AddList$City,state=AddList$State,zip=rep('',5))

If you check out the results, you will see that this API does not appear to do fuzzy matching. 2775 N Collin St failed, whereas 2775 N Collins St was able to return a match. You can also see though it will return an intersection, but in my tests "/" did not work (so in R you can simply use gsub to replace different intersection types with and). I haven’t experimented with it too much, so let me know if you have any other insight into this API.

I will follow up in another post a python function to use the Census geocoding API, as well as using the Nominatim online geocoding API, which you can use for addresses outside of the United States.

Communities and Crime

This was my first semester teaching undergrads at UT Dallas. I taught the Communities and Crime undergrad course. I thought it went very well, and I was impressed with the undergrads here. For the course I had students do a bunch of different prediction assignments based on open data in Dallas, such as predicting what neighborhood has the most crime, or which specific bar has the most assaults. The idea being they would use the theories I discussed in the prior lecture to make the best predictions.

For their final assignment, I had students predict an arbitrary area to capture the most robberies in 2016 (up to that point they had only been predicting crimes in 2015). I used the same metric that NIJ is using in their crime forecasting challenge – the predictive accuracy index. This is simply % crime/% area, so students who give larger areas are more penalized. This ended up producing a pretty neat capstone to the end of the semester.

Below is a screen shot of the map, and here is a link to an interactive version. ( sites only allow specific types of iframe sources, so my dropbox src link to the interactive Leaflet map gets stripped.)

Look forward to teaching this class again (as of now it seems I will regularly offer it every spring).

More news on classes to come soon. I am teaching GIS applications in Criminology online over the summer. For a quick idea about the content, it will be almost the same as the GIS course in criminal justice I previously taught at SUNY.

In short, if you think maps rock then you should take my classes ๐Ÿ˜‰

SPSS Statistics for Data Analysis and Visualization – book chapter on Geospatial Analytics

A book I made contributions to, SPSS Statistics for Data Analysis and Visualization, is currently out. Keith and Jesus are the main authors of the book, but I contributed one chapter and Jon Peck contributed a few.

The book is a guided tour through many of the advanced statistical procedures and data visualizations in SPSS. Jon also contributed a few chapters towards using syntax, python, and using extension commands. It is a very friendly walkthrough, and we have all contributed data files for you to be able to follow along through the chapters.

So there is alot of content, but I wanted to give a more specific details on my chapter, as I think they will be of greater interest to crime analysts and criminologists. I provide two case studies, one of using geospatial association rules to identify areas of high crime plus high 311 disorder complaints in DC (using data from my dissertation). The second I give an example of spatio-temporal forecasting of ShotSpotter data at the weekly level in DC using both prior shootings as well as other prior Part 1 crimes.

Geospatial Association Rules

The geospatial association rules is a technique for high dimensional contingency tables to find particular combinations among categories that are more prevalent. I show examples of finding that thefts from motor vehicles tend to be associated in places nearby graffiti incidents.

And that assaults tend to be around locations with more garbage complaints (and as you can see each has a very different spatial patterning).

I consider this to be a useful exploratory data analysis type technique. It is very similar in application to conjunctive analysis, that has prior very similar crime mapping applications in risk terrain modeling (see Caplan et al., 2017).

Spatio-Temporal Prediction

The second example case study is forecasting weekly shootings in fairly small areas (500 meter grid cells) using ShotSpotter data in DC. I also use the prior weeks reported Part 1 crime types (Assault, Burglary, Robbery, etc.), so it is similar to the leading indicators forecasting model advocated by Wilpen Gorr and colleagues. I show that prior shootings predict future shootings up to 5 lags prior (so over a month), and that the prior crimes do have an effect on future shootings (e.g. robberies in the prior week contribute to more shootings in the subsequent week).

If you have questions about the analyses, or are a crime analyst and want to apply similar techniques to your data always feel free to send me an email.

Identifying near repeat crime strings in R or Python

People in criminology should be familiar with repeats or near-repeats for crimes such as robbery, burglaries, or shootings. An additional neat application of this idea though is to pull out strings of incidents that are within particular distance and time thresholds. See this example analysis by Haberman and Ratcliffe, The Predictive Policing Challenges of Near Repeat Armed Street Robberies. This is particularly useful to an analyst interested in crime linkage — to see if those particular strings of incidents are likely to be committed by the same offender.

Here I will show how to pluck out those near-repeat strings in R or Python. The general idea is to transform the incidents into a network, where two incidents are connected only if they meet the distance and time requirements. Then you can identify the connected components of the graph, and those are your strings of near-repeat events.

To follow along, here is the data and the code used in the analysis. I will be showing this on an example set of thefts from motor vehicles (aka burglaries from motor vehicles) in Dallas in 2015. In the end I take two different approaches to this problem — in R the solution will only work for smaller datasets (say n~5000 or less), but the python code should scale to much larger datasets.

Near-repeat strings in R

The approach I take in R does the steps as follows:

  1. compute the distance matrix for the spatial coordinates
  2. convert this matrix to a set of 0’s and 1’s, 1’s correspond to if the distance is below the user specified distance threshold (call it S)
  3. compute the distance matrix for the times
  4. convert this matrix to a set of 0’1 and 1’s, 1’s correspond to if the distance is below the user specified time threshold (call it T)
  5. use element-wise multiplication on the S and T matrices, call the result A, then set the diagonal of A to zero
  6. A is now an adjacency matrix, which can be converted into a network
  7. extract the connected components of that network

So here is an example of reading in the thefts from motor vehicle data, and defining my function, NearStrings, to grab the strings of incidents. Note you need to have the igraph R library installed for this code to work.


MyDir <- "C:\\Users\\axw161530\\Dropbox\\Documents\\BLOG\\SourceNearRepeats"

BMV <- read.csv(file="TheftFromMV.csv",header=TRUE)

#make a function
NearStrings <- function(data,id,x,y,time,DistThresh,TimeThresh){
    library(igraph) #need igraph to identify connected components
    MyData <- data
    SpatDist <- as.matrix(dist(MyData[,c(x,y)])) < DistThresh  #1's for if under distance
    TimeDist <-  as.matrix(dist(MyData[,time])) < TimeThresh #1's for if under time
    AdjMat <- SpatDist * TimeDist #checking for both under distance and under time
    diag(AdjMat) <- 0 #set the diagonal to zero
    row.names(AdjMat) <- MyData[,id] #these are used as labels in igraph
    colnames(AdjMat) <- MyData[,id] #ditto with row.names
    G <- graph_from_adjacency_matrix(AdjMat, mode="undirected") #mode should not matter
    CompInfo <- components(G) #assigning the connected components

So here is a quick example run on the first ten records. Note I have a field that is named DateInt in the csv, which is just the integer number of days since the first of the year. In R though if the dates are actual date objects you can submit them to the dist function though as well.

#Quick example with the first ten records
BMVSub <- BMV[1:10,]
ExpStrings <- NearStrings(data=BMVSub,id='incidentnu',x='xcoordinat',y='ycoordinat',time='DateInt',DistThresh=30000,TimeThresh=3)

So here we can see this prints out:

> ExpStrings
            CompId CompNum
000036-2015      1       3
000113-2015      2       4
000192-2015      2       4
000251-2015      1       3
000360-2015      2       4
000367-2015      3       1
000373-2015      4       2
000378-2015      4       2
000463-2015      2       4
000488-2015      1       3

The CompId field is a unique Id for every string of events. The CompNum field states how many events are within the string. So we have one string of events that contains 4 records in this subset.

Now this R function comes with a big caveat, it will not work on large datasets. I’d say your pushing it with 10,000 incidents. The issue is holding the distance matrices in memory. But if you can hold the matrices in memory this will still run quite fast. For 5,000 incidents it takes around ~15 seconds on my machine.

#Second example alittle larger, with the first 5000 records
BMVSub2 <- BMV[1:5000,]
BigStrings <- NearStrings(data=BMVSub2,id='incidentnu',x='xcoordinat',y='ycoordinat',time='DateInt',DistThresh=1000,TimeThresh=3)

The elements in the returned matrix will line up with the original dataset, so you can simply add those fields in, and do subsequent analysis (such as exporting back into a mapping program and digging into the strings).

#Add them into the original dataset
BMVSub2$CompId <- BigStrings$CompId
BMVSub2$CompNum <- BigStrings$CompNum   

You can check out the number of chains of different sizes by using aggregate and table.

#Number of chains
table(aggregate(CompNum ~ CompId, data=BigStrings, FUN=max)$CompNum)

This prints out:

   1    2    3    4    5    6    7    9 
3814  405   77   27    3    1    1    1

So out of our first 1,000 incidents, using the distance threshold of 1,000 feet and the time threshold of 3 days, we have 3,814 isolates. Thefts from vehicles with no other incidents nearby. We have 405 chains of 2 incidents, 77 chains of 3 incidents, etc. You can pull out the 9 incident like this since there is only one chain that long:

#Look up the 9 incident
BMVSub2[BMVSub2$CompNum == 9,]  

Which prints out here:

> BMVSub2[BMVSub2$CompNum == 9,]
      incidentnu xcoordinat ycoordinat StartDate DateInt CompId CompNum
2094 043983-2015    2460500    7001459 2/25/2015      56   1842       9
2131 044632-2015    2460648    7000542 2/26/2015      57   1842       9
2156 045220-2015    2461162    7000079 2/27/2015      58   1842       9
2158 045382-2015    2460154    7000995 2/27/2015      58   1842       9
2210 046560-2015    2460985    7000089  3/1/2015      60   1842       9
2211 046566-2015    2460452    7001457  3/1/2015      60   1842       9
2260 047544-2015    2460154    7000995  3/2/2015      61   1842       9
2296 047904-2015    2460452    7001457  3/3/2015      62   1842       9
2337 048691-2015    2460794    7000298  3/4/2015      63   1842       9

Or you can look up a particular chain by its uniqueid. Here is an example of a 4-chain set.

> #Looking up a particular incident chains
> BMVSub2[BMVSub2$CompId == 4321,]
      incidentnu xcoordinat ycoordinat StartDate DateInt CompId CompNum
4987 108182-2015    2510037    6969603 5/14/2015     134   4321       4
4988 108183-2015    2510037    6969603 5/14/2015     134   4321       4
4989 108184-2015    2510037    6969603 5/14/2015     134   4321       4
4993 108249-2015    2510037    6969603 5/14/2015     134   4321       4

Again, only use this function on smaller crime datasets.

Near-repeat strings in Python

Here I show how to go about a similar process in Python, but the algorithm does not calculate the whole distance matrix at once, so can handle much larger datasets. An additional note is that I exploit the fact that this list is sorted by dates. This makes it so I do not have to calculate all pair-wise distances – I will basically only compare distances within a moving window under the time threshold – this makes it easily scale to much larger datasets.

So first I use the csv python library to read in the data and assign it to a list with a set of nested tuples. Also you will need the networkx library to extract the connected components later on.

import networkx as nx
import csv
import math

dir = r'C:\Users\axw161530\Dropbox\Documents\BLOG\SourceNearRepeats'

BMV_tup = []
with open(dir + r'\TheftFromMV.csv') as f:
    z = csv.reader(f)
    for row in z:

The BMV_tup list has the column headers, so I extract that row and then figure out where all the elements I need, such as the XY coordinates, the unique Id’s, and the time column are located in the nested tuples.

colnames = BMV_tup.pop(0)
print colnames
print BMV_tup[0:10]

xInd = colnames.index('xcoordinat')
yInd = colnames.index('ycoordinat')
dInd = colnames.index('DateInt')
IdInd = colnames.index('incidentnu')

Now the magic — here is my function to extract those near-repeat strings. Again, the list needs to be sorted by dates for this to work.

def NearStrings(CrimeData,idCol,xCol,yCol,tCol,DistThresh,TimeThresh):
    G = nx.Graph()
    n = len(CrimeData)
    for i in range(n):
        for j in range(i+1,n):
            if (float(CrimeData[j][tCol]) - float(CrimeData[i][tCol])) > TimeThresh:
                xD = math.pow(float(CrimeData[j][xCol]) - float(CrimeData[i][xCol]),2)
                yD = math.pow(float(CrimeData[j][yCol]) - float(CrimeData[i][yCol]),2)
                d = math.sqrt(xD + yD)
                if d < DistThresh:
    comp = nx.connected_components(G)
    finList = []
    compId = 0
    for i in comp:
        compId += 1
        for j in i:
    return finList

We can then do the same test on the first ten records that we did in R.

print NearStrings(CrimeData=BMV_tup[0:10],idCol=IdInd,xCol=xInd,yCol=yInd,tCol=dInd,DistThresh=30000,TimeThresh=3)

And this subsequently prints out:

[('000378-2015', 1), ('000373-2015', 1), ('000113-2015', 2), ('000463-2015', 2), ('000192-2015', 2), ('000360-2015', 2), 
('000251-2015', 3), ('000488-2015', 3), ('000036-2015', 3)]

The component Id’s wont be in the same order as in R, but you can see we have the same results. E.g. the string with three incidents contains the Id’s 000251, 000488, and 000036. Note that this approach does not return isolates — incidents which have no nearby space-time examples.

Running this on the full dataset of over 14,000 incidents takes around 20 seconds on my machine.

BigResults = NearStrings(CrimeData=BMV_tup,idCol=IdInd,xCol=xInd,yCol=yInd,tCol=dInd,DistThresh=1000,TimeThresh=3)

And that should scale pretty well for really big cities and really big datasets. I will let someone who knows R better than me figure out workarounds to scale to bigger datasets in that language.

Scraping Meth Labs with Python

For awhile in my GIS courses I have pointed to the DEA’s website that has a list of busted meth labs across the county, named the National Clandestine Laboratory Register. Finally a student has shown some interest in this, and so I spent alittle time writing a scraper in Python to grab the data. For those who would just like the data, here I have a csv file of the scraped labs that are geocoded to the city level. And here is the entire SPSS and Python script to go from the original PDF data to the finished product.

So first off, if you visit the DEA website, you will see that each state has its own PDF file (for example here is Texas) that lists all of the registered labs, with the county, city, street address, and date. To turn this into usable data, I am going to do three steps in Python:

  1. download the PDF file to my local machine using urllib python library
  2. convert that PDF to an xml file using the pdftohtml command line utility
  3. use Beautifulsoup to parse the xml file

I will illustrate each in turn and then provide the entire Python script at the end of the post.

So first, lets import the libraries we need, and also note I downloaded the pdftohtml utility and placed that location as a system path on my Windows machine. Then we need to set a folder where we will download the files to on our local machine. Finally I create the base url for our meth labs.

from bs4 import BeautifulSoup
import urllib, os

myfolder = r'C:\Users\axw161530\Dropbox\Documents\BLOG\Scrape_Methlabs\PDFs' #local folder to download stuff
base_url = r'' #online site with PDFs for meth lab seizures

Now to just download the Texas pdf file to our local machine we would simply do:

a = 'tx'
url = base_url + r'/' + a + '.pdf'
file_loc = os.path.join(myfolder,a)
urllib.urlretrieve(url,file_loc + '.pdf')

If you are following along and replaced the path in myfolder with a folder on your personal machine, you should now see the Texas PDF downloaded in that folder. Now I am going to use the command line to turn this PDF into an xml document using the os.system() function.

#Turn to xml with pdftohtml, does not need xml on end
cmd = 'pdftohtml -xml ' + file_loc + ".pdf " + file_loc

You should now see that there is an xml document to go along with the Texas file. You can check out its format using a text editor (wordpress does not seem to like me showing it here).

So basically we can use the top and the left attributes within the xml to identify what row and what column the items are in. But first, we need to read in this xml and turn it into a BeautifulSoup object.

MyFeed = open(file_loc + '.xml')
textFeed =
FeedParse = BeautifulSoup(textFeed,'xml')

Now the FeedParse item is a BeautifulSoup object that you can query. In a nutshell, we have a top level page tag, and then within that you have a bunch of text tags. Here is the function I wrote to extract that data and dump it into tuples.

#Function to parse the xml and return the line by line data I want
def ParseXML(soup_xml,state):
    data_parse = []
    page_count = 1
    pgs = soup_xml.find_all('page')
    for i in pgs:
        txt = i.find_all('text')
        order = 1
        for j in txt:
            value = j.get_text() #text
            top = j['top']
            left = j['left']
            dat_tup = (state,page_count,order,top,left,value)
            order += 1
        page_count += 1
    return data_parse

So with our Texas data, we could call ParseXML(soup_xml=FeedParse,state=a) and it will return all of the data nested in those text tags. We can just put these all together and loop over all of the states to get all of the data. Since the PDFs are not that large it works quite fast, under 3 minutes on my last run.

from bs4 import BeautifulSoup
import urllib, os

myfolder = r'C:\Users\axw161530\Dropbox\Documents\BLOG\Scrape_Methlabs\PDFs' #local folder to download stuff
base_url = r'' #online site with PDFs for meth lab seizures
state_ab = ['al','ak','az','ar','ca','co','ct','de','fl','ga','guam','hi','id','il','in','ia','ks',
state_name = ['Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','Florida','Georgia','Guam','Hawaii','Idaho','Illinois','Indiana','Iowa','Kansas',
              'Kentucky','Louisiana','Maine','Maryland','Massachusetts','Michigan','Minnesota','Mississippi','Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey',
              'New Mexico','New York','North Carolina','North Dakota','Ohio','Oklahoma','Oregon','Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
              'Utah','Vermont','Virginia','Washington','West Virginia','Wisconsin','Wyoming','Washington DC']

all_data = [] #this is the list that the tuple data will be stashed in

#Function to parse the xml and return the line by line data I want
def ParseXML(soup_xml,state):
    data_parse = []
    page_count = 1
    pgs = soup_xml.find_all('page')
    for i in pgs:
        txt = i.find_all('text')
        order = 1
        for j in txt:
            value = j.get_text() #text
            top = j['top']
            left = j['left']
            dat_tup = (state,page_count,order,top,left,value)
            order += 1
        page_count += 1
    return data_parse

#This loops over the pdfs, downloads them, turns them to xml via pdftohtml command line tool
#Then extracts the data

for a,b in zip(state_ab,state_name):
    #Download pdf
    url = base_url + r'/' + a + '.pdf'
    file_loc = os.path.join(myfolder,a)
    urllib.urlretrieve(url,file_loc + '.pdf')
    #Turn to xml with pdftohtml, does not need xml on end
    cmd = 'pdftohtml -xml ' + file_loc + ".pdf " + file_loc
    #parse with BeautifulSoup
    MyFeed = open(file_loc + '.xml')
    textFeed =
    FeedParse = BeautifulSoup(textFeed,'xml')
    #Extract the data elements
    state_data = ParseXML(soup_xml=FeedParse,state=b)
    all_data = all_data + state_data

Now to go from those sets of tuples to actually formatted data takes a bit of more work, and I used SPSS for that. See here for the full set of scripts used to download, parse and clean up the data. Basically it is alittle more complicated than just going from long to wide using the top marker for the data as some rows are off slightly. Also there is complications for long addresses being split across two lines. And finally there are just some data errors and fields being merged together. So that SPSS code solves a bunch of that. Also that includes scripts to geocode the to the city level using the Google geocoding API.

Let me know if you do any analysis of this data! I quickly made a time series map of these events via CartoDB. You can definately see some interesting patterns of DEA concentration over time, although I can’t say if that is due to them focusing on particular areas or if they are really the areas with the most prevalent Meth lab problems.

New undergrad course – Communities and Crime

This semester I am teaching a new undergrad course, communities and crime. Still a few seats left if you are a UT Dallas student and still interested. (You can also audit the course as well even if you are not a UT Dallas student.)

You can see the syllabus from the linked page, but compared to other syllabi I’ve found floating around, (see Dan O’Brien or Elizabeth Groff for two undergrad examples) I focus more on micro places than others. Some syllabi I’ve found spend basically the whole semester on social disorganization, which I think is excessive.

One experiment I am going to try for this course is to use Dallas Open crime data, and then have the students make predictions. For example, for their first assignment they are supposed to make their prediction based on social disorganization theory what neighborhood has the most crime in Dallas from this neighborhood map in Dallas. (Fusion table embedding not working in my WordPress post at the moment for some reason!)

These neighborhoods were obtained from Jane Massey, a researcher for the Dallas area Habitat for Humanity. Hence why the flood plain is its own neighborhood. It is the most reasonable source I’ve seen so far. Most generally agree (see Dallas Magazine for one example), but that data is not very tidy. See this web app to draw your own neighborhood in Dallas as well. And of course for students interested part of the discussion will be about how you define a neighborhood.

Preprint – A Quasi-Experimental Evaluation Using Roadblocks and Automatic License Plate Readers to Reduce Crime in Buffalo, NY

I have a new preprint article posted on SSRN – A Quasi-Experimental Evaluation Using Roadblocks and Automatic License Plate Readers to Reduce Crime in Buffalo, NY. This is some work I have been conducting with Scott Phillips out at SUNY Buffalo (as well as Dae-Young Kim, although he is not on this paper).

Here is the abstract:

Purpose: To evaluate the effectiveness of a hot spots policing strategy: using automated license plate readers at roadblocks.

Design: Different roadblock locations were chosen by the Buffalo Police Department every day over a two month period. We use propensity score matching to identify a set of control locations based on prior counts of crime and demographic factors before the intervention took place. We then evaluate the reductions in Part 1 crimes, calls for service, and traffic accidents at roadblock locations compared to control locations.

Findings: We find modest reductions in Part 1 violent crimes (10 over all roadblock locations and over the two months) using t-tests of mean differences. We find a 20% reduction in traffic accidents using fixed effects negative binomial regression models. Both results are sensitive to the model used though, and the fixed effects models predict increases in crimes due to the intervention.

Research Limitations: The main limitations are the quasi-experimental nature of the intervention, the short length of the intervention, and that many micro places have low baseline counts of crime.

Originality/Value: This adds to literature on hot spots policing – in particular on the use of automated license plate readers and traffic enforcement at hot spots of crime. While the results are mixed, it provides some evidence that the intervention has potential to reduce crime.

And here is one figure from the paper, showing how street units are defined, and given the intersection the road block was stationed on how we determined the treated street units:

Feedback is always welcome!

Presentation at ACJS 2016

I will be presenting at the ACJS (Academy of Criminal Justice Sciences) conference in Denver in a few days. My talk will be on some of the work I have been conducting with the Albany Police Department via the Finn Institute (Rob Worden and Sarah McLean are co-authors on the presentation). The title is Making stops smart: Predicting arrest rates from discretionary police stops at micro places in Albany, NY and here is the abstract:

Police stops are one of the most invasive, but regularly used crime control tactics by police. Similar to how focusing police resources at hot spots of crime can improve police efficiency, here we examine the spatial variation in arrest rates at micro places (street segments and intersections) in Albany, NY. Using data from over 240,000 discretionary police stops, we fit random effects logistic regression models to predict the probability of an arrest at different micro places. We show that like hot spots, there are examples of high arrest rate locations next to low arrest rate locations. Using a simulation, we show that if one displaced stops from low arrest locations to high arrest locations, one could make half as many stops but still have the same number of total arrests.

Here is a funnel chart of the arrest hit rates at micro-places across the city. You can see quite a bit of extra variation in arrest rates to attempt to explain.

I am giving this at 8 am on Thursday (see Event #185 in the program)

There will be two other presentations at the moment (Ling Wu is not going to make it), and they are:

  • Results from a victim generated crime mapping software, Zavin Nazaretian et al. – Indiana University of PA
  • Spatial analysis of aggravated assault and homicide crime scene, arrest and offender residence locations in Houston, TX, Elishewah Weisz – Sam Houston

So if you are interested in crime mapping stuff it should be a good session.

Feel free to bug me if you see me around at ACJS.


Also before I forget, my co-workers are presenting a poster on analysis of Syracuse Truce – a focused deterrence gang intervention. The posters are on Friday, so I won’t be around unfortunately. The title isย Gangs, groups, networks, and deterrence: An evaluation of Syracuse Truce.ย (See poster #45 in the same program I linked to earlier.) Rob and Kelly will at least be manning the poster though – so you can go and bug them about the details!

Here is a picture of the reach of call-ins for one particular gang. The idea is for those who attended call ins to spread the message to other members. So this graph evaluates how well the call-ins would be expected to reach all of the members of the gang.

If you are wondering what I do for my job – yes I pretty much just make maps and graphs all day long ๐Ÿ˜‰




Adding a command button to a toolbar in ArcGIS

I’m currently teaching a graduate level class in Crime Mapping using ArcGIS. I make my own tutorials from week to week, and basically sneak in generic pro-tips for using the software while students are doing other regular types of analyses. I can only subject my students to so much though – but here is one I have found useful, adding a regularly used button to a toolbar.

I use CrimeStat to generate kernel densities from point data, so as of V10 whenever I want to make a classified raster map I get this error:

V9 it used to just do this for you automatically :(.

I typically make classified raster maps simply because I think they look nicer than continous ones. My continuous ones always look fuzzy, whereas having discrete cuts you can focus attention on particular hot spot areas. It is arbitrary for sure – but that is something we need to learn to live with when making maps.

So in class I had students open ArcToolBox, navigate down the tree, and find the Calculate Statistics tool for rasters. In my personal set up though I do this enough that I added the button to my toolbar. So first, go to the file menu and in customize -> toolbars make sure you have the spatial analyst toolbar selected. (Here is a kernel density grd file to follow along with if you want.)

Now in the right hand most edge of the new spatial analyst toolbar, left click on the little downward pointing arrow and select Customize. (Sorry, my toolbar is a bit crowded!)

In the customize window that pops up, select the Commands tab. Now in this window you can select any particular command and then drag it onto any toolbar. Here I go to Data Management in the left hand categories area, and then scroll down till I find the Calculate Statistics button.

Then I left click on the Calculate Statistics row, hold down the mousebutton, and drag it to my toolbar.

Now you are done, and ArcGIS saves this button on the toolbar when making future maps. You can change the icon if you want, but there are tooltips when hovering over the icon (so even if you have multiple hammers on your toolbars it only takes a second to browse between them).