Census Dotmap Methods

Here's a brief, technical, and somewhat incomplete guide to making your own census dotmap. If you want to get started quickly, Peter Richardson cleaned up this mess and put it on github.

Get the data

United States

The finest level of geographical granularity available for the 2010 census is the block level (which are aggregated into block groups, which are in turn aggregated into census tracts, and so on upward). The Census publishes block boundaries pre-joined with census counts in ESRI Shapefile format on a state-by-state basis. For some reason their HTTP download channel tends to corrupt the files, but the somewhat slower ftp site is reliable. The files names correspond to each region's FIPS code.


The shapefile is here. The counts are here. Some assembly is required.


Locality points with associated counts id located here. Note that these are points, not shapes. To get shapes, voronoi polygons were generated using QGIS, and then clipped using the Mexican national border.

Generate the dots

I wrote a python script to read each block from each shapefile and generate N dots uniformly throughout the shape, where N is the count of people in the block. Here's the code. Note the dots are dumped into a sqlite3 database, along with a "quadkey". This is the address of the dot's map tile.

After you're done with that, I dumped the whole thing to a csv from the command line with

$ sqlite3 -csv people.db "select x,y,quadkey from people" > people.csv

The resulting csv is about 17 GB.

Sort the dots

Sorting the dots by quadkey is a kind of cheapo spatial index. I tried getting sqlite3 to do it for me, but it choked. I wrote a script to sort it in place with mergesort. Hacknology alert. This took my laptop about 12 hours.

Generate tiles

Once the dots are sorted by quadkey, all you have to do is scan down the list, starting a new tile whenever a particular tile level changes. A peculiar side effect of this method is that quads with no dots are simply never generated. Processing sketch here. Note that Processing's anti-aliasing behavior varies from system to system. I generated the tiles on OS X 10.8.2 with Processing 1.5.1, using the P2D renderer.

Viewing the tiles

I'm hosting the tiles on Amazon S3, and showing them using the Google Maps API v3, using the ImageMapType. The Google Maps API is easy to use, but uploading hundreds of thousand of images to S3 is a huge pain. I wrote a little Python script to use the AWS library "boto" so I could re-use an HTTP connection to upload multiple files.