I’ve recently scraped some geographic data that I may use in my graduate level GIS course. I figured I would share with everyone, and take some time to describe for others how I scraped the data.
So to start, if you read an online article and it has a webmap with some GIS data in it – the data exists somewhere. It won’t always be the case that you can actually download the data, but for the most current and popular interactive mapping tools, the data is often available if you know where to look.
For example, I asked on the GIS stackexchange site awhile ago how can you download the point data in this NYC homicide map from the Times. I had emailed the reporters multiple times and they did not respond. A simple solution the answerers suggested was to use website developer tools to see what was being loaded when I refreshed the page. It happened that the map is being populated by a two simple text files (1,2).
It may be an interesting project to see how this compares (compiling via news stories) versus official data, which NYC recently released going back to 2006. Especially since such crowdsourced news datasets are used for other things, like counting mass shootings.
The two example mapping datasets I provide below though are a bit different process to get the underlying data – but just as easy. Many current webmaps use geojson files as the backend. What I did for the two examples below is I just looked at the html source for the website, and look for json data formats – links that specify “js” or “json” extensions. If you click through those external json links you can see if they have the data.
The other popular map type though comes from ESRI. You can typically find an ESRI server populating the map, and if the website has say a parcel data lookup you can often find an ESRI geocoding server (see here for one example of using an ESRI geocoding api). The maps though unfortunately do not always have exposed data. Sometimes what looks like vector data are actually just static PNG tiles. Council Districts in this Dallas map are an example. If you dig deep enough, you can find the PNG tiles for the council districts, but that does not do anyone much good. Pretty much all of those layers are available for download from other sources though. A similar thing happens with websites with crime reports, such as RAIDS Online or CrimeReports.com. They intentionally build the web map so you cannot scrape the data.
So that said, before we go further though – it should go without saying that you should not steal/plagiarize people’s articles or simply rip-off their graphics. Conducting new analysis with the publicly available data though seems fair game to me.
Banksy Taggings in NYC
There was a recent stink in the press about Kim Rossmo and company using geographic offender profiling to identify the likely home location of the popular graffiti artist Banksy. Here is the current citation of the journal article for those interested:
Hauge, M. V., Stevenson, M. D., Rossmo, D. K., and Le Comber, S. C. (2016). Tagging banksy: using geographic profiling to investigate a modern art mystery. Journal of Spatial Science, pages 1-6. doi:10.1080/14498596.2016.1138246
The article uses data from Britain, so I looked up to see if his taggings in other places was available. I came across this article showing a map of locations in New York City. So I searched where the data was coming from, and found the json file that contains the point data here. I just built a quick excel spreadsheet to parse the data, and you can download that spreadsheet here.
This article posts a set of gang territories in NYC. This is pretty unique – I am unfamiliar with any other public data source that identifies gang territories. So I figured it would be a potential fun project for students in my GIS course – for instance in overlaying with the 311 graffiti data.
Again the data at the backend is in json format, and can be found here. To convert this data to a shapefile is a bit challenging, as it has points, lines and polygons all in the same file. What I did was buffer the lines and points by a small amount to be able to stuff them all in one shapefile. A zip file of that shapefile can be downloaded here.
Drop me a note if you use this data, I’d be interested in your analyses! Hence why I am sharing the data for others to play with 🙂