New paper: A simple weighted displacement difference test to evaluate place based crime interventions

At the ECCA conference this past spring Jerry Ratcliffe asked if I could apply some of my prior work on evaluating changes in crime patterns over time to make a set of confidence intervals for the weighted displacement quotient statistic (WDQ). The answer to that is no, you can’t, but in its stead I created another statistic in which you can do that, the weighted displacement difference (WDD). The work is published in the open access journal Crime Science.

The main idea is we wanted a simple statistic folks can use to evaluate place based interventions to reduce crime. All you need is pre and post crime counts for you treated and control areas of interest. Here is an excel spreadsheet to calculate the statistic, and below is a screen shot. You just need to fill in the pre and post counts for the treated and control locations and the spreadsheet will spit out the statistic, along with a p-value and a 95% confidence interval of the number of crimes reduced.

What is different compared to the WDQ statistic is that you need a control area for the displacement area too in this statistic. But if you are not worry about displacement, you can actually just put in zero’s for the displacement area and still do the statistic for the local (and its control area). In this way you can actually do two estimates, one for the local effects and one for the displacement. Just put in zero’s for the other values.

While you don’t really need to read the paper to be able to use the statistic, we do have some discussion on choosing control areas. In general the control areas should have similar counts of crime, you shouldn’t have a treatment area that has 100 crimes and a control area that only has 10 crimes. We also have this graph, which is basically a way to conduct a simple power analysis — the idea that “could you reasonably detect whether the intervention reduced crime” before you actually conduct the analysis.

So the way to read this graph is if you have a set of treated and control areas that have an average of 100 crimes in each period (so the cumulative total crimes is around 800), the number of crimes you need to reduce due to the intervention to even have weak evidence of a crime reduction (a one-tailed p-value of less than 0.1), the intervention needs to have prevented around 30 crimes. Many interventions just aren’t set up to have strong evidence of crime reductions. For example if you have a baseline of 20 crimes, you need to prevent 15 of them to find weak evidence of effectiveness. Interventions in areas with fewer baseline crimes basically cannot be verified they are effective using this simple of a design.

For those more mathy, I created a test statistic based on the differences in the changes of the counts over time by making an assumption that the counts are Poisson distributed. This is then basically just a combination of two difference-in-difference estimates (for the local and the displacement areas) using counts instead of means. For researchers with the technical capabilities, it probably makes more sense to use a data based approach to identify control areas (such as the synthetic control method or propensity score matching). This is of course assuming an actual randomized experiment is not feasible. But this is too much a burden for many crime analysts, so if you can construct a reasonable control area by hand you can use this statistic.

Advertisements

American Community Survey Variables of Interest to Criminologists

I’ve written prior blog posts about downloading Five Year American Community Survey data estimates (ACS for short) for small area geographies, but one of the main hiccups is figuring out what variables you want to use. The census has so many variables that are just small iterations of one another (e.g. Males under 5, males 5 to 9, males 10 to 14, etc.) that it is quite a chore to specify the ones you want. Often you want combinations of variables or to calculate percentages as well, so you need to take two or more variables and turn them into your constructed variable.

I have posted some notes on the variables I have used for past projects in an excel spreadsheet. This includes the original variables, as well as some notes for creating percentage variables. Some are tricky — such as figuring out the proportion of black residents for block groups you need to add non-Hispanic black and Hispanic black estimates (and then divide by the total population). For spatially oriented criminologists these are basically indicators commonly used for social disorganization. It also includes notes on what is available at the smaller block group level, as not all of the variables are. So you are more limited in your choices if you want that small of area.

Let me know if you have been using other variables for your work. I’m not an expert on these variables by any stretch, so don’t take my list as authoritative in any way. For example I have no idea whether it is valid to use the imputed data for moving in the prior year at the block group level. (In general I have not incorporated the estimates of uncertainty for any of the variables into my analyses, not sure of the additional implications for the imputed data tables.) Also I have not incorporated variables that could be used for income-inequality or for ethnic heterogeneity (besides using white/black/Hispanic to calculate the index). I’m sure there are other social disorganization relevant variables at the block group level folks may be interested in as well. So let me know in the comments or shoot me an email if you have suggestions to update my list.

I would prefer if as a field we could create a set of standardized indices so we are not all using different variables (see for example this Jeremy Miles paper). It is a bit hodge-podge though what variables folks use from study-to-study, and most folks don’t report the original variables so it is hard to replicate their work exactly. British folks have their index of deprivation, and it would be nice to have a similarly standardized measure to use in social science research for the states.


The ACS data has consistent variable names over the years, such as B03001_001 is the total population, B03002_003 is the Non-Hispanic white population, etc. Unfortunately those variables are not necessarily in the same tables from year to year, so concatenating ACS results over multiple years is a bit of a pain. Below I post a python script that given a directory of the excel template files will produce a nice set of dictionaries to help find what table particular variables are in.

#This python code grabs ACS meta-data templates
#To easier search for tables that have particular variables
import xlrd, os

mydir = r'!!!Insert your path to the excel files here!!!!!'

def acs_vars(directory):
    #get the excel files in the directory
    excel_files = []
    for file in os.listdir(directory):
        if file.endswith(".xls"):
            excel_files.append( os.path.join(directory, file) )
    #getting the variables in a nice dictionaries
    lab_dict = {}
    loc_dict = {}
    for file in excel_files:
        book = xlrd.open_workbook(file) #first open the xls workbook
        sh = book.sheet_by_index(0)
        vars = [i.value for i in sh.row(0)] #names on the first row
        labs = [i.value for i in sh.row(1)] #labels on the second
        #now add to the overall dictionary
        for v,l in zip(vars,labs):
            lab_dict[v] = l
            loc_dict[v] = file
    #returning the two dictionaries
    return lab_dict,loc_dict
    
labels,tables = acs_vars(mydir)

#now if you have a list of variables you want, you can figure out the table
interest = ['B03001_001','B02001_005','B07001_017','B99072_001','B99072_007',
            'B11003_016','B14006_002','B01001_003','B23025_005','B22010_002',
            'B16002_004']
            
for i in interest:
    head, tail = os.path.split(tables[i])
    print (i,labels[i],tail)

The length it takes from submission to publication

The other day I received a positive comment about my housing demolition paper. It made me laugh abit inside — it felt like I finished that work so long ago it was talking about history. That paper was not so ancient though, I submitted it 8/4/17, went through one round of revision, and I got the email from Jean McGloin for conditional acceptance on 1/16/18. It then came online first a few months later (3/15/18), and is in the current print issue of JRCD, which came out in May 2018.

This ignores the time it takes from conception to finishing a project (we started the project sometime in 2015), but focusing just on the publishing process this is close to the best case scenario for the life-cycle of a paper through peer reviewed journals in criminology & criminal justice. The realist best case scenario typically is:

  • Submission
  • Wait 3 months for peer reviews
  • Get chance to revise-resubmit
  • Wait another 3 months for second round of reviews and editor final decision

So ignoring the time it takes for editors to make decisions and the time for you to turn around edits, you should not bank on a paper being accepted under 6 months. There are exceptions to this, some journals/editors don’t bother with the second three month wait period for reviewers to look at your revisions (which I think is the correct way to do it), and sometimes you will get reviews back faster or slower than three months, but that realist scenario is the norm for most journals in the CJ/Crim field. Things that make this process much slower (multiple rounds of revisions, editors taking time to make decisions, time it takes to make extensive revisions), are much more common than things that can make it go shorter (I’ve only heard myths about a uniform accept on the first round without revisions).

Not having tenure this is something that is on my mind. It is a bit of a rat race trying to publish all the papers expected of you, and due to the length of peer review times you essentially need to have your articles out and under review well before your tenure deadline is up. The six month lag is the best case scenario in which your paper is accepted at the first journal you submit to. The top journals are uber competitive though, so you often have to go through that process multiple times due to rejections.

So to measure that time I took my papers, including those not published, to see what this life-cycle time is. If I only included those that were published it would bias the results to make the time look shorter. Here I measured the time it took from submission of the original article until when I received the email of the paper being accepted or conditionally accepted. So I don’t consider the lag time at the end with copy-editing and publishing online, nor do I consider up front time from conception of the project or writing the paper. Also I include three papers that I am not shopping around anymore, and censored them at the date of the last reject. For articles still under review I censored them at 5/9/18.

So first, for 25 of my papers that have received one editorial decision, here is a graph of the typical number of rejects I get for each paper. A 0 for a paper means it was published at the first journal I submitted to, a 1 means I had one reject and was accepted at the second journal I submitted the paper to, etc. (I use "I" but this includes papers I am co-author on as well.) The Y axis shows the total percentage, and the label for each bar shows the total N.

So the proportion of papers of mine that are accepted on the first round is 28%, and I have a mean of 1.6 rejections per article. This does not take into account censoring (not sure how to for this estimate), and that biases the estimate of rejects per paper downward here, as it includes some articles under review now that will surely be rejected at some point after writing this blog post.

The papers with multiple rejects run the typical gamut of why academic papers are sometimes hard to publish. Null results, a hostile reviewer at multiple places, controversial findings. It also illustrates that peer review is not necessarily a beacon showing the absolute truth of an article. I’m pretty sure everything I’ve published, even papers accepted at the first venue, have had one reviewer with negative comments. You could find reasons to reject the findings of anything I write that has been peer reviewed — same as you can think many of my pre-print articles are correct or useful even though they do not currently have a peer review stamp of approval.

Most of those rejections add about three months to the life-cycle, but some can be fast (these include desk rejections), and some can be slower (rejections on later rounds of revisions). So using those begin times, end times, and taking into account censoring, I can estimate the typical survival time of my papers within the peer-review system when lumping all of those different factors together into the total time. Here is the 1 - survival chart, so can be interpreted as the number of days until publication. This includes 26 papers (one more that has not had a first decision), so this estimate does account for papers that are censored.

The Kaplan-Meier estimate of the median survival times for my papers is 290 days. So if you want a 50% chance of your article being published, you should expect 10 months based on my experience. The data is too sparse to estimate extreme quantiles, but say I want an over 80% probability of an article being published based on this data, how much time do I need? The estimate based on this data is at least 460 days.

Different strategies will produce different outcomes — so my paper survival times may not generalize to yours, but I think that estimate will be pretty reasonable for most folks in Crim/CJ. I try to match papers to journals that I think are the best fit (so I don’t submit everything to Criminology or Justice Quarterly at the first go), so I have a decent percent of papers that land on the first round. If I submitted first round to more mediocre journals overall my survival times would be faster. But even many mid-tiered journals in our field have overall acceptance rates below 10%, nothing I submit I ever think is really a slam dunk sure thing, so I don’t think my overall strategy is the biggest factor. Some of that survival time is my fault and includes time editing the article in between rejects and revise-resubmits, but the vast majority of this is simply waiting on reviewers.

So the sobering truth for those of us without tenure is that based on my estimates you need to have your journal articles out of the door well over a year before you go up for review to really ensure that your work is published. I have a non-trivial chunk of my work (near 20%) that has taken over one and a half years to publish. Folks currently getting their PhD it is the same pressure really, since to land a tenure track job you need to have publications as well. (It is actually one I think reasonable argument to take a longer time writing your dissertation.) And that is just for the publishing part — that does not include actually writing the article or conducting the research. The nature of the system is very much delayed gratification in having your work finally published.

Here is a link to the data on survival times for my papers, as well as the SPSS code to reproduce the analysis.

Work on Shootings in Dallas Published

I have two recent articles that examine racial bias in decisions to shoot using Dallas Police Data:

  • Wheeler, Andrew P., Scott W. Phillips, John L. Worrall, and Stephen A. Bishopp. (2018) What factors influence an officer’s decision to shoot? The promise and limitations of using public data. Justice Research and Policy Online First.
  • Worrall, John L., Stephen A. Bishopp, Scott C. Zinser, Andrew P. Wheeler, and Scott W. Phillips. (2018) Exploring bias in police shooting decisions with real shoot/don’t shoot cases. Crime & Delinquency Online First.

In each the main innovation is using control cases in which officers pulled their firearm and pointed at a suspect, but decided not to shoot. Using this design we find that officers are less likely to shoot African-Americans, which runs counter to most recent claims of racial bias in police shootings. Besides the simulation data of Lois James, this is a recurring finding in the recent literature — see Roland Fryer’s estimates of this as well (although he uses TASER incidents as control cases).

The reason for the two articles is that me and John through casual conversation found out that we were both pursuing very similar projects, so we decided to collaborate. The paper John is first author examined individual officer level outcomes, and in particular retrieved personnel complaint records for individual officers and found they did correlate with officer decisions to shoot. My article I wanted to intentionally stick with the publicly available open data, as a main point of the work was to articulate where the public data falls short and in turn suggest what information would be needed in such a public database to reasonably identify racial bias. (The public data is aggregated to the incident level — one incident can have multiple officers shooting.) From that I suggest instead of a specific officer involved shooting database, it would make more sense to have officer use of force (at all levels) attached to incident based reporting systems (i.e. NIBRS should have use of force fields included). In a nutshell when examining any particular use-of-force outcome, you need a counter-factual that is that use-of-force could happen, but didn’t. The natural way to do that is to have all levels of force recorded.

Both John and I thought prior work that only looked at shootings was fundamentally flawed. In particular analyses where armed/unarmed was the main outcome among only a set of shooting cases confuses cause and effect, and subsequently cannot be used to determine racial bias in officer decision making. Another way to think about it is that when only looking at shootings you are just limiting yourself to examining potentially bad outcomes — officers often use their discretion for good (the shooting rate in the Dallas data is only 3%). So in this regard databases that only include officer involved shooting cases are fundamentally limited in assessing racial bias — you need cases in which officers did not shoot to assess bias in officer decision making.

This approach of course has some limitations as well. In particular it uses another point of discretion for officers – when to draw their firearm. It could be the case that there is no bias in terms of when officers pull the trigger, but they could be more likely to pull their gun against minorities — our studies cannot deny that interpretation. But, it is also the case other explanations could explain why minorities are more likely to have an officer point a gun at them, such as geographic policing or even more basic that minorities call the police more often. In either case, at the specific decision point of pulling the trigger, there is no evidence of racial bias against minorities in the Dallas data.

I did not post pre-prints of this work due to the potentially contentious nature, as well as the fact that colleagues were working on additional projects based on the same data. I have posted the last version before the copy-edits of the journal for the paper in which I am first author here. If you would like a copy of the article John is first author always feel free to email.

Digg reader is shutting down, giving Twitter a try

I’ve used RSS feeds for quite awhile now to keep up with blogs I enjoy. I also use it to follow scholarly journals of interest. Unfortunately, my current feed reader of choice (Digg Reader) is shutting down.

This is the second time my feed reader has shuttered (I used Google Reader before that shut down as well). Another particular problem I never really solved was link rot. Google reader has some metrics where you could see old feeds that have not had any new posts for awhile. Digg had no such service, and I tried my hand at writing python code to do this myself, but that code never quite worked out.

To partially replace this service instead of migrating to another feed reading service I will give Twitter a shot. Twitter is a bit chaotic from what I can tell — I much prefer the spreadsheet like listing of just a title to peruse news and events of interest in the morning. I had been using Google+ and like it (yes, I know I’m one of those nerds), but it is a bit of a ghost-town. So I will migrate entirely over to Twitter and give it a shot.

Paper published: The effect of housing demolitions on crime in Buffalo, New York

I have a new paper published with a few of my colleagues up in Buffalo, Dae-Young Kim and Scott Phillips. This work looks at the crime reduction effects of widespread demolitions in Buffalo, is titled The Effect of Housing Demolitions on Crime in Buffalo, New York, and was published at the Journal of Research in Crime & Delinquency. In short, at the micro level there is very strong evidence that demolitions reduce crime — the neighborhood level the evidence is not as strong. This is likely partly due to the neighborhood level analysis being underpowered, as several of the estimates between the two are very similar overall.

If you cannot get access to that published article, you can always send me an email for a copy, or you can download a pre-print version from SSRN.

Below is one of the images from the paper, a set of small-multiple maps showing demographic characteristics of Buffalo census tracts:

Someone could surely replicate this micro level result in other cities that have experienced widespread demolitions (like Detroit). But for long term city planners I would consider more rigorous designs that incorporate not only selective demolition, but other neighborhood investment strategies to improve neighborhoods over long term. That is, this research is good evidence of the near-term crime reduction effects of demolitions, but for the long haul leaving empty lots is not going to greatly improve neighborhoods.

New working paper: Modeling the Spatial Patterns of Intra-Day Crime Trends

I have a new working paper out with Cory Haberman, Modeling the Spatial Patterns of Intra-Day Crime Trends. Below is the abstract:

Several prior studies have found that despite theoretical expectations otherwise, facilities (such as on-premise alcohol outlets) have consistent effects on crime regardless of time of the day (Bernasco et al., 2017; Haberman & Ratcliffe, 2015). We explain these results by failure to account for the regular background wave of crime, which results from ubiquitous patterns of human routine activities. Using eight years of data on assaults and robberies in Seattle (WA), we demonstrate the regularity of the within-day crime wave for all areas of the city. Then using models to predict when a crime will most likely occur, we demonstrate how schools and on-premise alcohol outlets cause bumps in the background wave at particular times of the day, such as when school dismisses. But those bumps dissipate quite rapidly in space, and are relatively small compared to the amplitude of the regular background wave of crime. Although facilities have theoretical times in which they should have a greater influence on crime patterns, they are situated within a community of other human activity uses, making it difficult to uniquely identify their effects separately from other aspects of the built environment.

And here is a joyplot showing the changes in the hour of day wave depending on how close robberies are to a public high school or middle school:

You can see bumps very nearby schools at 7 am, then around noon and throughout the later afternoon, but are smoothed out when you get to around 2,000 feet away from schools.

The idea behind this paper is that several recent articles have not found much of a conditional relationship between crime generators and time of day. For example you would think bars only effect crime at nighttime when most people are at the bar, but several recent articles found the time of day does not make much of a difference (Bernasco et al., 2017; Haberman & Ratcliffe, 2015). We hypothesize this is because of the background wave of crime per hour of the day is much larger in magnitude than any local factor. An intuitive reason for this is that a place never has just a bar in isolation, there are other local land uses nearby that influence criminal patterns. You can see places nearby crime generators cause slight bumps in the background wave, but they are tiny compared to the overall amplitude of the general within day crime wave.

The article has a link to data and code to reproduce the findings. As always if you have feedback I am all ears.

Paper published: Evaluating Community Prosecution Code Enforcement in Dallas, Texas

Some work John Worrall and I collaborated on was just published in Justice Quarterly, Evaluating Community Prosecution Code Enforcement in Dallas, Texas. I have two links to share:

If you need access to the article always feel free to email.

Below is the abstract:

We evaluated a community prosecution program in Dallas, Texas. City attorneys, who in Dallas are the chief prosecutors for specified misdemeanors, were paired with code enforcement officers to improve property conditions in a number of proactive focus areas, or PFAs, throughout the city. We conducted a panel data analysis, focusing on the effects of PFA activity on crime in 19 PFAs over a six-year period (monthly observations from 2010 to 2015). Control areas with similar levels of pre-intervention crime were also included. Statistical analyses controlled for pre-existing crime trends, seasonality effects, and other law enforcement activities. With and without dosage data, the total crime rate decreased in PFA areas relative to control areas. City attorney/code enforcement teams, by seeking the voluntary or court-ordered abatement of code violations and criminal activity at residential and commercial properties, apparently improved public safety in targeted areas.

This was a neat program, as PFAs are near equivalents of hot spots that police focus on. So for the evaluation we drew control areas from Dallas PD’s Target Area Action Grid (TAAG) Areas:

New preprint: The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence

I have a new preprint out, The accuracy of the violent offender identification directive (VOID) tool to predict future gun violence. This is work with Rob Worden and Jasmine Silver from our time at the Finn Institute. Below is the abstract:

We evaluate the Violent Offender Identification Directive (VOID) tool, a risk assessment instrument implemented within a police department to prospectively identify offenders likely to be involved with future gun violence. The tool uses a variety of static measures of prior criminal history that are readily available in police records management systems. The VOID tool is assessed for predictive accuracy by taking a historical sample and calculating scores for over 200,000 individuals known to the police at the end of 2012, and predicting 103 individuals involved with gun violence (either as a shooter or a victim) during 2013. Despite weights for the instrument being determined in an ad-hoc manner by crime analysts, the VOID tool does very well in predicting involvement with gun violence compared to an optimized logistic regression and generalized boosted models. We discuss theoretical reasons why such ad-hoc instruments are likely to perform well in identifying chronic offenders for all police departments.

There were just slightly over 100 violent gun offenders we were trying to pick out of over 200,000. The VOID tool did really well! Here is a graph comparing how many of those offenders VOID captured compared to a generalized boosted model (GBM), and two different logistic regression equations.

I have some of my thoughts in this article as to why a simple tool does just as well as more complicated regression and machine learning techniques, which is a common finding in recidivism studies as well. My elevator pitch for why that is is because most offenders are generalists, and for example you can basically swap prior arrests for robbery with prior arrests for motor vehicle theft — they both provide essentially the same signal for future potential criminality. See also discussion of this on Dan Simpson’s post on the Stat Modeling, Causal Inference and Social Science blog, which in turn makes me think the idea behind simple models can be readily applied to many decision points in the criminal justice field.

The simple takeaway from this for crime analysts making chronic offender lists is that don’t let the perfect be the enemy of the good. Analysts can likely create an ad-hoc weighting to prioritize chronic offenders and it will do quite well compared to fancier models.

I will be presenting this work at the ACJS conference in New Orleans on Saturday 2/17/18. It is a great session, with YongJei Lee, Jerry Ratcliffe, Bryanna Fox, and Stacy Sechrist (see session 384 in the ACJS program), so stop on by. If you want to catch up with me in New Orleans just send me an email. And as always if you have feedback on the draft I am all ears.

Come work with me! And advice on applying to grad school

Applications to apply for the PhD program in Criminology & Criminal Justice at UT Dallas are due soon, January 15th is the deadline. Are you interested in crime mapping, crime analysis, evaluating policing interventions intended to reduce crime? Well then apply for a PhD at UTD and come work with me!

For general advice, the PhD program is more like a mentor/mentee relationship than your prior schooling. The expectation is that you will become an independent scholar, and the best way to do that is to actually be involved with research projects of your advisor and subsequently conduct independent research for your dissertation.

So wherever you apply, you should have a good idea about what topics you want to pursue and who on the faculty you want to work with. And that focus should be reflected in your cover letter. That working relationship with a professor should be a stronger deciding factor than the rank of the program — if no one on the faculty conducts research in the area you want to pursue you will have a hard slog at pursuing that topical area for your dissertation.

If you are interested in working with a specific professor, feel free to send them an email about your interests and if they have time to advise another student. Not everyone does, and if you do not get response by email I would take that as a red flag that professor may not have enough time to effectively mentor you. See this scatterplot blog post for advice when emailing potential advisers.