Pages

Thursday, December 2, 2010

Dot Density Maps with Python and OGR

If you use Python for GIS sooner or later you'll use GDAL for manipulating raster data and its vector cousin OGR for working with vector data. OGR has a Python API for most of the methods in the C++ library and even provides some basic geometry analysis. And most importantly it can read/write and therefore convert data in a variety of vector file and database formats.

OGR provides a fast way to create dot density maps.  A dot density map represents statistical information about an area as mathematically distributed points. Areas with higher values have a higher concentration of points. This is one of my favorite types of maps because it is a great example of GIS - visualizing geographic data in a way that is instantly comprehensible.

I'm using OGR in this example because it can read and write shapefiles. But unlike the Python Shapefile Library it can also perform basic geometry operations needed for this sample. Most GIS programs would display the population information on some type of memory layer instead of actually outputting a shapefile for the density layer as demonstrated here.  But we're going to keep things simple for this example and just create a shapefile.

Assuming you have Python installed, here are some basic gdal/ogr installation instructions.
1. Go to http://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries and download the gdal binary for your platform
2. Extract the directory to your hard drive
3. Add the "bin" directory within the gdal folder to your system shell path
4. Set the path to the "data" directory in the gdal folder to an environment variable called "GDAL_DATA"
5. Install the appropriate python module for your Python version and platform from here: http://pypi.python.org/pypi/GDAL/1.6.0#downloads

If you want to follow along with the example below you can download the source shapefile:
http://pyshp.googlecode.com/files/GIS_CensusTract.zip

The end result of this demo is pictured above with both the input census block and output dot density shapefiles. 

The following code will read in the source shapefile, calculate the number of points needed to represent the population density evenly, and then create the point shapefile:

from osgeo import ogr
import random
# Open shapefile, get OGR "layer", grab 1st feature
source = ogr.Open("GIS_CensusTract_poly.shp")
county = source.GetLayer("GIS_CensusTract_poly")
feature = county.GetNextFeature()
# Set up the output shapefile and layer
driver = ogr.GetDriverByName('ESRI Shapefile')
output = driver.CreateDataSource("PopDensity.shp")
dots = output.CreateLayer("PopDensity", geom_type=ogr.wkbPoint)
while feature is not None:
  field_index = feature.GetFieldIndex("POPULAT11")
  population = int(feature.GetField(field_index))
  # 1 dot = 100 people
  density = population / 100
  # Track dots created
  count = 0   
  while count < density:
    geometry = feature.GetGeometryRef()
    minx, maxx, miny, maxy = geometry.GetEnvelope()
    x = random.uniform(minx,maxx)
    y = random.uniform(miny,maxy)
    f = ogr.Feature(feature_def=dots.GetLayerDefn())
    wkt = "POINT(%f %f)" % (x,y)
    point = ogr.CreateGeometryFromWkt(wkt)
    # Don't use the random point unless it's inside the polygon.
    # It should be close as it's in the bounding box
    if feature.GetGeometryRef().Contains(point):
        f.SetGeometryDirectly(point)
        dots.CreateFeature(f)
        count += 1
    # Destroy C object.
    f.Destroy()
  feature = county.GetNextFeature()
source.Destroy()
output.Destroy()    

There is no error handling in this sample so if you run it multiple times you need delete the output dot density shapefile.

Note that this type of rendering only works when you have one polygon representing each data value. For example you couldn't do this operation with a world country boundary shapefile because islands like Hawaii associated with a country would force an inaccurate representation. For that type of map you need to use a choropleth map.

Also note that when you use OGR for shapefile editing you must specify a "layer" after opening a file. This extra step is necessary because OGR handles dozens of formats, some of which are layered vector formats such as DWG using the same API. Also because OGR is a wrapped C library you have to adjust to explicitly destroying objects and extreme camel casing on method calls usually not found in Python.

OGR and the raster equivalent GDAL are two very powerful libraries which dominate the open source geospatial world. They are also included in several well-known commercial packages thanks to the commercial-friendly MIT license.

3 comments:

  1. Hi Joel,

    Hopefully you will gimme some tips and/or insights. I have hundreds of shapefiles, all of them contain only polygons. I need to extract the attributes at fixed distances, say every 1000 meters. It is a bit like rasterize: add a layer of points on top of the polygons layer and get the attributes at each point. Unless I am missing something, neither your nice library or the OGR one let you read the attributes at specific locations, I mean is there a way to know the attributes of this or that shapefile at this lat and lon?

    Thanks for your help/comments

    Damian

    ReplyDelete
  2. Damian,

    You are on the right track. Check out my "Point in Polygon" post from Jan. 19th.

    You want to do a spatial intersection between your polygons and the fixed-width point locations.

    If you know the x,y location of your points this will be easy. If you need to calculate the location of each fixed-distant point the difficulty/accuracy will vary based on the projection of your data.

    Here is some pseudo-code. The comment box may screw up the spacing so beware.

    # Your collection of 1,000 meter
    # points
    pts = [[-89,33], [90, 32], [91,29]]

    # Your polygon shapefile
    r = shapefile.Reader("your_polygons")

    # Attributes captured
    attr = []

    for p in pts:
    p[0] = x
    p[y] = y
    for s in r.sf.shapeRecords():
    if point_in_poly(x, y, s.shape.points):
    attr.append(s.record)


    Let me know if that helps.

    ReplyDelete
  3. P.S. - You can do the same thing with OGR using the "contains" method as described in this dot density map post.

    If you need to calculate the distance between two points in decimal degrees try the Haversine formula

    And as I predicted the indentation on my post above is screwed up so try to work around that.

    ReplyDelete