Paper published: Evaluating Community Prosecution Code Enforcement in Dallas, Texas

Some work John Worrall and I collaborated on was just published in Justice Quarterly, Evaluating Community Prosecution Code Enforcement in Dallas, Texas. I have two links to share:

If you need access to the article always feel free to email.

Below is the abstract:

We evaluated a community prosecution program in Dallas, Texas. City attorneys, who in Dallas are the chief prosecutors for specified misdemeanors, were paired with code enforcement officers to improve property conditions in a number of proactive focus areas, or PFAs, throughout the city. We conducted a panel data analysis, focusing on the effects of PFA activity on crime in 19 PFAs over a six-year period (monthly observations from 2010 to 2015). Control areas with similar levels of pre-intervention crime were also included. Statistical analyses controlled for pre-existing crime trends, seasonality effects, and other law enforcement activities. With and without dosage data, the total crime rate decreased in PFA areas relative to control areas. City attorney/code enforcement teams, by seeking the voluntary or court-ordered abatement of code violations and criminal activity at residential and commercial properties, apparently improved public safety in targeted areas.

This was a neat program, as PFAs are near equivalents of hot spots that police focus on. So for the evaluation we drew control areas from Dallas PD’s Target Area Action Grid (TAAG) Areas:

Advertisements

New course in the spring – Crime Science

This spring I will be teaching a new graduate level course, Crime Science. A better name for the course would be evidence based policing tactics to reduce crime — but that name is too long!

Here you can see the current syllabus. I also have a page for the course, which I will update with more material over the winter break.

Given my background it has a heavy focus on hot spots policing (different tactics at hot spots, time spent at hot spots, crackdowns vs long term). But the class covers other policing strategies; such as chronic offenders, the focused deterrence gang model, and CPTED. We also discuss the use of technology in policing (e.g. CCTV, license plate readers, body-worn-cameras).

I will weave in ethical discussions throughout the course, but I reserved the last class to specifically talk about predictive policing strategies. In particular the two main concerns are increasing disproportionate minority contact through prediction, and privacy concerns with police collecting various pieces of information.

So take my course!

Monitoring homicide trends paper published

My paper, Monitoring Volatile Homicide Trends Across U.S. Cities (with coauthor Tom Kovandzic) has just been published online in Homicide Studies. Unfortunately, Homicide Studies does not give me a link to share a free PDF like other publishers, but you can either grab the pre-print on SSRN or always just email me for a copy of the paper.

They made me convert all of the charts to grey scale :(. Here is an example of the funnel chart for homicide rates in 2015.

And here are example fan charts I generated for a few different cities.

As always if you have feedback or suggestions let me know! I posted all of the code to replicate the analysis at this link. The prediction intervals can definately be improved both in coverage and in making their length smaller, so I hope to see other researchers tackling this as well.

Notes on using UCR data for class projects

Students in my classes often want to use UCR reported data for projects. One thing many don’t realize though is that the UCR data reported to the FBI is only aggregate statistics at regular intervals for the entire jurisdiction. So for example one can’t look at hot spots using reported UCR data.

If you do have a hypothesis that can be reasonably examined using monthly or yearly data at the jurisdiction level, here are a few notes on using UCR data. First is that you can get the most detailed downloads of data from ICPSR. That link has data series going back to 1960, and ends up being about two years behind (e.g. it is close to the end of 2017, and only 2015 data is available).

The datasets on ICPSR have monthly data for Part 1 crime types, as well as some information on arrests and clearances. Also they have all of the individual agencies, along with their ORI code. The ORI code allows you to link agencies over time.

While the FBI does have a page for more up to date UCR data (they just released the 2016 stats, so they are about a year behind), they are much more limited in the types of tables they disseminate. There typically is one table for Part 1 crime rates for individual large cities for each year, but otherwise it is aggregated to different city sizes. So most data analyses need to use the ICPSR data — the data directly from the FBI is not detailed enough.

For those wishing to map the data, it ends up being a bit tricky. Most people in the US are probably under the jurisdiction of at least two police departments — the local PD and the state police. Many people are also under the jurisdiction of a local sheriff. So many of these police agencies have overlapping boundaries. There is no easy source of the geographic boundaries for the police departments, but the ICPSR data does contain the zipcode for the headquarters for the police department. This won’t be accurate for state police — but should be suitable for mapping purposes for local agencies and sheriffs (sheriffs are sometimes organized at the county level). If you want polygon data for jurisdictional boundaries you will need to search for individual agencies and political boundaries — there is no easy source to download them all at once. Many rural areas will have police departments cover multiple towns, but if you stick to more urban areas you might be able to use city boundaries.

The ICPSR data has crime reports aggregated to the county level, so if that level of aggregation is not problematic you may use that data directly. You should be aware of many of the complaints about UCR data quality though. Mike Maltz has written a bit about it, but there are quite a few other folks who have noticed problems with reporting in the UCR data. The main problem to watch out for is missing data being accidentally reported as zero crimes occurring.

To stack datasets from different years from ICPSR is not too difficult if you are not going too far back in time. But if you go back to the older data, ICPSR changed the variable order. The variables are simply listed as V1 TO V100 something, so for example V15 in 1979 is not the same variable as V15 in 2005. My notes say they used the same variable order from 1998-2015, but you will want to check that yourself (I downloaded the SPSS files, it would not surprise me if the datasets differed for some of the years.)

Some additional resources students may want to familiarize themselves with to gather UCR data more quickly are the FBI UCR data tool and Mike Maltz’s cleaned up dataset and notes on how he made it. You should probably just use Mike Maltz’s dataset if you are using data over time.

If you are just interested in yearly homicides, I have provided a dataset of cleaned up homicides that goes back to 1960, see my paper that goes along with that dataset on graphing temporal homicide trends (mapping those trends could be an interesting project as well!)

Graphs and interrupted time series analysis – trends in major crimes in Baltimore

Pete Moskos’s blog is one I regularly read, and a recent post he pointed out how major crimes (aggravated assaults, robberies, homicides, and shootings) have been increasing in Baltimore post the riot on 4/27/15. He provides a series of different graphs using moving averages to illustrate the rise, see below for his initial attempt:

He also has an interrupted moving average plot that shows the break more clearly – but honestly I don’t understand his description, so I’m not sure how he created it.

I recreated his initial line plot using SPSS, and I think a line plot with a guideline shows the bump post riot pretty clearly.

The bars in Pete’s graph are not the easiest way to visualize the trend. Here making the line thin and lighter grey also helps.

The way to analyze this data is using an interrupted time series analysis. I am not going to go through all of those details, but for those interested I would suggest picking up David McDowell’s little green book, Interrupted Time Series Analysis, for a walkthrough. One of the first steps though is to figure out the ARIMA structure, which you do by examining the auto-correlation function. Here is that ACF for this crime data.

You can see that it is positive and stays quite consistent. This is indicative of a moving average model. It does not show the geometric decay of an auto-regressive process, nor is the autocorrelation anywhere near 1, which you would expect for an integrated process. Also the partial autocorrelation plot shows the geometric decay, which is again consistent with a moving average model. See my note at the bottom, how this interpretation was wrong! (Via David Greenberg sent me a note.)

Although it is typical to analyze crime counts as a Poisson model, I often like to use linear models. Coefficients are much easier to interpret. Here the distribution of the counts is high enough I am ok using a linear interrupted ARIMA model.

So I estimated an interrupted time series model. I include a dummy variable term that equals 1 as of 4/27/15 and after, and equals 0 before. That variable is labeled PostRiot. I then have dummy variables for each month of the year (M1, M2, …., M11) and days of the week (D1,D2,….D6). The ARIMA model I estimate then is (0,0,7), with a constant. Here is that estimate.

So we get an estimate that post riot, major crimes have increased by around 7.5 per day. This is pretty similar to what you get when you just look at the daily mean pre-post riot, so it isn’t really any weird artifact of my modeling strategy. Pre-riot it is under 25 per day, and post it is over 32 per day.

This result is pretty robust across different model specifications. Dropping the constant term results in a larger post riot estimate (over 10). Inclusion of fewer or more MA terms (as well as seasonal MA terms for 7 days) does not change the estimate. Inclusion of the monthly or day of week dummy variables does not make a difference in the estimate. Changing the outlier value on 4/27/15 to a lower value (here I used the pre-mean, 24) does reduce the estimate slightly, but only to 7.2.

There is a bit of residual autocorrelation I was never able to get rid of, but it is fairly small, with the highest autocorrelation of only about 0.06.

Here is the SPSS code to reproduce the Baltimore graphs and ARIMA analysis.

As a note, while Pete believes this is a result of depolicing (i.e. Baltimore officers being less proactive) the evidence for that hypothesis is not necessarily confirmed by this analysis. See Stephen Morgan’s analysis on crime and arrests, although I think proactive street stops should likely also be included in such an analysis.


This Baltimore data just shows a bump up in the series, but investigating homicides in Chicago (here at the monthly level) it looks to me like an upward trend post the McDonald shooting. This graph is at the monthly level.

I have some other work on Chicago homicide geographic patterns going back quite a long time I can hopefully share soon!

I will need to update the Baltimore analysis to look at just homicides as well. Pete shows a similar bump in his charts when just examining homicides.

For additional resources for folks interested in examining crime over time, I would suggest checking out my article, Monitoring volatile homicide trends across U.S. cities, as well as Tables and Graphs for Monitoring Crime Patterns. I’m doing a workshop at the upcoming International Association for Crime Analysts conference on how to recreate such graphs in Excel.


David Greenberg sent me an email  to note my interpretation of the ACF plots was wrong – and that a moving average process should only have a spike, and not show the slow decay. He is right, and so I updated the interrupted ARIMA models to include higher order AR terms instead of MA terms. The final model I settled on was (5,0,0) — I kept adding higher order AR terms until the AR coefficients were not statistically significant. For these models I still included a constant.

For the model that includes the outlier riot count, it results in an estimate that the riot increased these crimes by 7.5 per day, with a standard error of 0.5

This model has no residual auto-correlation until you get up to very high lags. Here is a table of the Box-Ljung stats for up to 60 lags.

Estimating the same ARIMA model with the outlier value changed to 24, the post riot estimate is still over 7.

Subsequently the post-riot increase estimate is pretty robust across these different ARIMA model settings. The lowest estimate I was able to get was a post mean increase of 5 when not including an intercept and not including the outlier crime counts on the riot date. So I think this result holds up pretty well to a bit of scrutiny.

New working paper: Choosing Representatives to Deliver the Message in a Group Violence Intervention

I have a new preprint up on SSRN, Choosing Representatives to Deliver the Message in a Group Violence Intervention. This is what I will be presenting at ACJS next Friday the 24th. Here is the abstract:

Objectives: The group based violence intervention model is predicated on the assumption that individuals who are delivered the deterrence message spread the message to the remaining group members. We focus on the problem of who should be given the initial message to maximize the reach of the message within the group.

Methods: We use social network analysis to create an algorithm to prioritize individuals to deliver the message. Using a sample of twelve gangs in four different cities, we identify the number of members in the dominant set. The edges in the gang networks are defined by being arrested or stopped together in the prior three years. In eight of the gangs we calculate the reach of observed call-ins, and compare these with the sets defined by our algorithm. In four of the gangs we calculate the reach for a strategy that only calls-in members under supervision.

Results: The message only needs to be delivered to around 1/3 of the members to reach 100% of the group. Using simulations we show our algorithm identifies the minimal dominant set in the majority of networks. The observed call-ins were often inefficient, and those under supervision could be prioritized more effectively.

Conclusions: Group based strategies should monitor their potential reach based on who has been given the message. While only calling-in those under supervision can reach a large proportion of the gang, delivering the message to those not under supervision will likely be needed to reach 100% of the group.

And here is an image of the observed reach for one of the gang networks using both call-ins and custom notifications.

The paper has the gang networks available at this link, and uses Python to do the network analysis and SPSS to draw the graphs.

If you are interested in applying this to your work let me know! Not only do I think this is a good idea for focused deterrence initiatives for criminal justice agencies, but I think the idea can be more widely applied to other fields in social sciences, such as public health (needle clean/dirty exchange programs) or organizational studies (finding good leaders in an organization to spread a message).

Paper on Roadblocks in Buffalo published

My paper with Scott Phillips, A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY, has just been published online first in the Security Journal. Springer gifts me a special link in which you can read the paper. Previously when I have been given links like that from the publisher they have a time limit, but the email for this one said nothing. But even if that goes bad you can always read my pre-print of the article I posted on SSRN.


Title: A quasi-experimental evaluation using roadblocks and automatic license plate readers to reduce crime in Buffalo, NY

Abstract:

This article evaluates the effective of a hot spots policing strategy: using automated license plate readers at roadblocks in Buffalo, NY. Different roadblock locations were chosen by the Buffalo Police Department every day over a two-month period. We use propensity score matching to identify a set of control locations based on prior counts of crime and demographic factors. We find modest reductions in Part 1 violent crimes (10 over all roadblock locations and over the two months) using t tests of mean differences. We find a 20% reduction in traffic accidents using fixed effects negative binomial regression models. Both results are sensitive to the model used though, and the fixed effects models predict increases in crimes due to the intervention. We suggest that the limited intervention at one time may be less effective than focusing on a single location multiple times over an extended period.

And here is Figure 2 from the paper, showing the units of analysis (street midpoints and intersections) and how the treatment locations were assigned.

Much ado about nothing: Overinterpreting volatility in homicide rates

I’m not much of a macro criminologist, but being asked questions by my dad (about Richard Rosenfeld and the Ferguson effect) and the dentist yesterday (asking about some of Trumps comments about rising crime trends) has prompted me to jump into it and give my opinion. Long story short — many sources I believe are overinterpreting short term fluctuations as more meaningful than they are.

First I will tackle national crime rates. So if you have happened to walk by a TV playing CNN the past few days, you may have heard Donald Trump being criticized for his statements on crime rates. This is partially a conflation with the difference between overall levels of crime versus changes in crime over time. Basically crime is currently low compared to historical patterns, but homicide rates have been rising in the past two years. This is easier to show in a chart than to explain in words. So here is the national estimated homicide rate per 100,000 individuals since 1960.1

2016 is not official and is still an estimate, but basically the pattern is this – crime has been falling generally across the country since the early 1990’s. Crime rates in just the past few years have finally dropped below levels in the 1960’s, but for the past two years homicides have been increasing. So some have pointed to the increase in the past two years and have claimed the sky is falling. To say this they say the rate of change is the largest in past 40 years. There are better charts to show rates of change (a semi-log chart), but the overall look is basically the same.

You have to really squint to see that change from 2014 to 2015 is a larger jump than any of the changes over the entire period, so arguments based on the size of recent changes in the homicide rate are hyperbole (either on a linear scale or a logarithmic scale). And even if you take the recent increases over the past two years as evidence of a more general rising trend, for a broader term pattern we still have homicide rates close to a low point in the past 50 years.

For a bit of general advice — any source that gives you a percent change you always want to see the base numbers and any longer term historical trends. Any media source that cites recent increases in homicides without providing this graph of long term historical crime trends is simply misleading. I’ve seen this done in many places, see this example from the New York Times or this recent note from the Economist. So this isn’t something specific to the President.

Now, macro criminologists don’t really have any better track record explaining these patterns than macro economists have in explaining economic trends. Basically we have a bunch of patch work theories that make sense for parts of the trend, but not the entire time frame. Changes in routine activities in 1960’s, increases in incarceration, the decline of crack use, ease of calling 911 with cell-phones, lead use, abortion (just to name a few). And academics come up with new theories all the time, the most recent being the Ferguson effect — which is simply another term for de-policing.

Now a bit on trends for specific cities. How this ties in with the national trend is that some articles have been pointing out that some cities have seen increases and some have not. That is fine to point out (albeit trivial), but then the articles frequently go on generate stories about why crime is rising in those specific places. Those on the left cite civil unrest and police brutality as possible reasons (Milwaukee, St. Louis, Chicago, Baltimore), while those on the right cite the deleterious effects of police departments not being as proactive (stops in Chicago, arrests in Baltimore).

While any of these explanations may turn out reasonable in the end, I’m pretty sure most of these articles severely underappreciate the volatility in homicide rates. Take an example with St. Louis, with a city population of just over 300,000. A homicide rate of 50 individuals per 100,000 means a total of 150 murders. A homicide rate of 40 per 100,000 means 120 murders. So we are only talking about a change of 30 murders overall. Fluctuations of around 10 in the murder rate would not be unexpected for a city with a population of 300,000 individuals. The confidence interval for a rate of 150 murders per 300,000 individuals is 126 to 176 murders.2

Even that though understates the typical volatility in homicide rates. As basically that assumes the proportion does not change over time. In reality crime statistics are more bursty, and show wilder fluctuations in different places.3 To show this for many cities, I use the data from the Economist article mentioned earlier, and create a motion chart of the changes in homicide rates over time. The idea behind this chart is a funnel chart. Cities with lower populations will show higher variance, and subsequently those dots on the left hand side of the chart will jump around alot more. The population figures are current and not varying, so the dots just move up and down on the Y axis.

For best viewing, make the X axis on the log scale, and size the points according to the population of the city. If you are at a desktop computer, you can open up a bigger version of the chart here.

Selecting individual points and then letting the animation run though illustrates the typical variability of crime over time. Here is the trace of St. Louis over the 36 year period.

New Orleans is another good example, we have fluctuations from under 30 to over 90 in the time period.

And here is Chicago, which shows less fluctuation than the smaller cities (as expected) but still has a range of homicide rates around 20 over the time period.

Howard Wainer has previously pointed this relationship out, and called it The Most Dangerous Equation. Basically, if you look you will be able to find some upward crime trends, especially in smaller cities. You need to look at it in the long term though and understand typical fluctuations to make a reasonable decision as to whether crime is increasing or if it is just typical year to year variation. The majority of news articles on the topic and just chock full of post hoc ergo propter hoc for particular cherry picked cites, and they often don’t make sense in explaining crime patterns over the past decade in those particular cities, let alone make sense for different cities experience similar conditions but not having rising homicide rates.



  1. For my notes about data sources, generally the data have come from the FBI UCR data tool (for the 1960 through 2014 data). 2015 data have come from the FBI web page for the 2015 UCR report. The 2016 projections come from this Economist article as well as the 50 cities data for the google motion chart.
  2. Calculated in R via (binom.test(150,300000)$conf.int[1:2])*300000. This is the exact Clopper-Pearson confidence interval.
  3. So even though this 538 article does a better job of acknowledging volatility, whatever test they use to determine statistically significant increases is likely to have too many false positives.

Keeping it simple: Viz. mass shooting definitions

My wife asked me the other day about some mass shooting statistics, in particular some claims of an average of one a day in the US. Without knowing the source, I told her outright it is probably because that person widened the net to events beyond what most people stereotypically consider a mass shooting.

Now, I have no personal opinion on how it should be defined, and being a researcher in criminal justice I appreciate people digging into the details. I was prompted to write this post by an interactive application showing how the numbers change by Kevin Schaul of the Washington Post (referred via Flowing Data). I was pretty frustrated by Kevin’s example interactive application though – there are much simpler ways than making me change the definition and seeing what individual events pop up. Here is an example screen shot of inputting a definition and then how Kevin’s data pop out.

So, downloading the same Reddit data for 2015 so far (as of 12/7/15) I created what I consider to be simple summaries. Caveat – these crowdsourced datasets are likely to have substantial missing data, especially towards the events with fewer injured. First I made a frequency histogram of the total number of dead per incident.

So you can see that if you only want to include dead in your personal definition, the one per day statistic is a dramatic over-representation. If you want to draw the line at 5 or more you will have around 9 more events than you would if you made the line at 6 or more. If you make the line at 10 or more there are only two incidents, but there are another 4 if you include incidents with 8 or 9 dead.

Another simple overview is a table. Here are tables of dead, injured, and the combined counts per each incident, sorted in descending value of the count. So the way to read this is that there there 147 seperate incidents in the reddit database that had 0 deaths, and 104 that had only one death, etc. The tables also have percents and cumulative percentage, so you can see how where you define the cut-point changes how much of the data you chop-off. Cumulative counts would be just as useful.

I have no personal problem using injured as well in a mass shooting definition. Basically the difference between being shot and being killed is seemingly due to random happenstance, so a shooting with 10 injured and no one killed can easily be argued to be a mass shooting in my opinion. Kevin’s interactive makes you choose an and condition though between injured and killed, whereas one could place the cut point at an or condition or simply the combined total. Here is a cross tabulation of the frequencies of injured by dead.

You can clearly see the reddit definition is the combined total of injured or dead is 4 via the line on the upper left of the table. Kevin’s and condition forces you to make a cut-point along each axis, basically choosing a rectangle in the lower right of the above crosstab table. If you want a combined total though, it will be along a diagonal somewhere in the table.

I appreciate these interactive visualizations allow a viewer to dig deeper into specific events in the data, but that does not mean some simple summaries could not also accompany the piece.

License plate readers and the trade off in privacy

As a researcher in criminal justice, tackling ethical questions is a difficult task. There are no hypotheses to test, nor models to fit, just opinions bantering around. I figured I would take my best shot and writing some coherent thoughts on the topic of the data police collect and its impacts on personal privacy – and my blog is really the best outlet.

What prompted this is a recent Nick Selby post which suggested the use of license plate readers (LPRs) to target Johns in LA is one of the worst ideas ever and a good example of personal privacy invasion by law enforcement. (Also see this Washington Post opinion article.)

I have a bit of a different and more neutral take on the program, and will try to articulate some broader themes in personal privacy invasion and the collection/use of data by police. I think it is an important topic and will continue to be with the continual expansion of public sensor data being collected by the police (with body worn cameras, stationary cameras, cell phone data, GPS traces being some examples). Basically, much of the negative sentiment I’ve seen so far of this hypothetical intervention are for reasons that don’t have to do with privacy. I’ll articulate these points by presenting alternative, currently in use police programs that use similar means, but have different ends.

To describe the LA program in a nutshell, the police use what are called license plate readers to identify particular vehicles being driven in known prostitution areas. LPRs are just cameras that take a snapshot of a license plate, automatically code the alpha-numeric plate, and then place that [date-time-location-plate-car image] in a database. Linking up this data with registered vehicles, in LA the idea is to have the owner of the vehicle sent a letter in the mail. The letter itself won’t have any legal consequences, just a note that says the police know you have been spotted. The idea in theory is that you will think you are more likely to be caught in the future, and may have some public shaming also if your family happens to see the letter, so you will be less likely to solicit a prostitute in the future.

To start with, some of the critiques of the program focus on the possibilities of false positives. Probably no reasonable person would think this is a worthwhile idea if the false positive rate is anything but small – people will be angry with being falsely accused, there are negative externalities in terms of family relationships, and any potential crime reduction upside would be so small that it is not worthwhile. But, I don’t think that itself is damning to this idea – I think you could build a reasonable algorithm to limit false positives. Say the car is spotted multiple times at a very specific location, and specific times, and the home owners address is not nearby the location. It would be harder to limit false positives in areas where people conduct other legitimate business, but I think it has potential with just LPR data, and would likely improve by adding in other information from police records.

If you have other video footage, like from a stationary camera, I think limiting false positives can definitely be done by incorporating things like loitering behavior and seeing the driver interact with an individual on the street. Eric Piza has done similar work on human coding/monitoring video footage in Newark to identify drug transactions, and I have had conversations with an IBM Smart City rep. and computer scientists about automatically coding audio and video to identify particular behaviors that are just as complicated. False negatives may still be high, but I would be pretty confident you could create a pretty low false positive rate for identifying Johns.

As a researcher, we often limit our inquiries to just evaluating 1) whether the program works (e.g. reduces crime) and 2) if it works whether it is cost-effective. LPR’s and custom notifications are an interesting case compared to say video cameras because they are so cheap. Camera’s and the necessary data storage infrastructure are so expensive that, to be frank, are unlikely to be a cost-effective return on investment in any short term time frame even given the best case scenario crime reductions (ditto for police body worn cameras). LPR’s and mailing letters on the other hand are cheap (both in terms of physical capital and human labor), so even small benefits could be cost-effective.

So in short, I don’t think the idea should be dismissed outright because of false positives, and the idea of using public video/sensor footage to proactively identify criminal behavior could be expanded to other areas. I’m not saying this particular intervention would work, but I think it has better potential than some programs police departments are currently spending way more money on.

Assuming you could limit the false positives, the next question then is it ok for the police to intrude on the privacy of individuals who have not committed any particular crime? The answer to this I don’t know, but there are other examples of police sending letters that are similar in nature but haven’t generated much critique. One is the use of letters to trick offenders with active warrants to turning themselves in. Another more similar example though are custom notifications. These are very similar in that often the individuals aren’t identified because of specific criminal charges, but are identified using data analytics and human intelligence to place them as high risk and gang involved offenders. Intrusion to privacy is way higher for these custom notifications than the suggested Dear John letters, but individuals did much more to precipitate police action as well.

When the police stop you in the car or on the street the police are using discretion to intrude in your privacy under circumstances where you have not necessarily committed a crime. Is there any reason a cop has to take that action in person versus seeing it on a video? Automatic citations at red light cameras are similar in mechanics to what this program is suggesting.

The note about negative externalities to legitimate businesses in the areas and the cost of letters I consider hyperbole. Letters are cheap, and actual crime data is frequently available that could already be used to redline neighborhoods. But Nick’s critique of the information being collated by outside agencies and used in other actuarial aspects, such as loans and employment decisions, I think is legitimate. I have no good answers to this problem – I have mixed feelings as I think open data is important (which ironically I can’t quantify in any meaningful way), and I think perpetual online criminal histories are a problem as well. Should we not have public crime maps though because businesses are less likely to invest in high crime neighborhoods? I think doing a criminal background check for many businesses is a legitimate query as well.

I have mixed feelings about familial shaming being an explicit goal of the letters, but compared to an arrest the letter is mundane. It is even less severe than a citation (which given some state laws you could be given a citation for loitering in a high prostitution area). Is a program that intentionally tries to shame a person – which I agree could have incredible family repercussions – a legitimate goal of the criminal justice system? Fair question, but in terms of privacy issues though I think it is a red herring – you can swap out different letters that would not have those repercussions but still uses the same means.

What if instead of the "my eyes are on you" letter the police simply sent a PSA like post-card that talked about the blight of sex workers? Can police never send out letters? How about if police send out letters to people who have previous victimizations about ways to prevent future victimization? I have a feeling much of the initial negative reactions to the Dear John program are because of the false positive aspect and the "victimless" nature of the crime. The ethical collection and use of data is a bit more subtle though.

LPR data was initially intended to passively identify stolen cars, but it is pretty ripe for mission creep. One example is that the police could use LPR data to actively track a cars location without a warrant. It is easy to think of both good and other bad examples of its use. For good examples, retrospectively identifying a car at the scene of a crime I think is reasonable, or to notify the police of a vehicle associated with a kidnapping.

For another example use of LPR data, what if the police did not send custom notifications, but used such LPR data to create a John list of vehicles, and then used that as information to profile the cars? If we think using LPR data to identify stolen cars is a legitimate use should we ignore the data we have for other uses? Does the potential abuse of the data outweigh the benefits – so LPR collection shouldn’t be allowed at all?

For equivalent practices, most police departments have chronic offender or gang lists that use criminal history, victimizations, where you have been stopped and who you have been stopped with to create similar databases. This is all from data the police routinely collect. The LPR data can be reasonably questioned whether it is available for such analytics use – police RMS data is often available in large swaths to the general public though.

Although you can question whether police should be allowed to collect LPR data, I am going to assume LPR data is not going to go away, and cameras definitely are not. So how do you regulate the use of such data within police departments? In New York, when you conduct an online criminal history check you have to submit a reason for doing the check. That is a police officer or a crime analyst can’t do a check of your next door neighbor because you are curious – you are supposed to have a more relevant reason related to some criminal investigation. You could have a similar set up with LPR that prevents actively monitoring a car except in particular circumstances and to purge the data after a particular time frame. It would be up to the state though to enact legislation and monitor its use. There is currently some regulation of gang databases, such as sending notifications to individuals if they are on the list and when to take people off the list.

Similar questions can be extended beyond public cameras though to other domains, such as DNA collection and cell phone data. Cell phone data is regularly collected with warrants currently. DNA searching is going beyond the individual to familial searches (imagine getting a DUI, and then the police use your DNA to tell that a close family member committed a rape).

Going forward, to frame the discussion of police behavior in terms of privacy issues, I would ask two specific questions:

  • Should the police be allowed to collect this data?
  • Assuming the police have said data, what are reasonable uses of that data?

I think the first question, should the police be allowed to collect this data, should be intertwined with how well does the program work and how cost-effective is the program (or potential if the program has not been implemented yet). There are no bright lines, but there will always be a trade off between personal privacy and public intrusion. Higher personal intrusion would demand a higher level of potential benefits in terms of safety. Given that LPR’s are passively collecting data I consider it an open question whether they meet a threshold of whether it is reasonable for the police to collect such data.

Some data police now collect, such as public video and DNA, I don’t see going away whether or not they meet a reasonable trade-off. In those cases I think it is better to ask what are reasonable uses of that data and how to prevent abuses of it. Basically any police technology can be given extreme examples where it saved a life or where a rogue agent used it in a nefarious way. Neither extreme case should be the only information individuals use to evaluate whether such data collection and use is ethical though.