Pages

Wednesday, May 13, 2015

Clipping a Shapefile in Pure Python

In a previous post, I showed how to use pure Python to subset a shapefile by attributes.  That method is fast if the attributes define the data you want to clip.  But sometimes you need to clip data spatially.

In this post, we'll clip a road shapefile using bounding boxes.  This is the simplest type of clip.  In a future post, I'll demonstrate how to use irregular polygons which is more complex.  However the ingredients I'll use for that recipe are already on this blog (this post + PyShp + the ray-casting point-in-polygon algorithm).

Here's our goal for this example. We have a shapefile with the major roads of the United States which you can download a sample dataset here. We want to use a simple bounding box to subset the roads for Puerto Rico. Here's a view of part of that shapefile including Florida and Puerto Rico in the lower right of the image.  It gives you a good idea of the density and number of records in this data.




Each road in the shapefile has its own shape record with the bounding box of the road in the record header.  We'll just do a simple bounding box comparison to isolate the roads within our desired bounding box and then write those results to a new shapefile.  You'll also need the latest version of PyShp which provides the newer iterShapeRecords() method to provide and efficient iterator for both the shapes and the dbf attributes in a single object:

import shapefile

# Create a reader instance for our US Roads shapefile
r = shapefile.Reader("roadtrl020") 

# Create a writer instance copying the reader's shapefile type
w = shapefile.Writer(r.shapeType)

# Copy the database fields to the writer
w.fields = list(r.fields)

# Our selection box that contains Puerto Rico
xmin = -67.5
xmax = -65.0
ymin = 17.8
ymax = 18.6

# Iterate through the shapes and attributes at the same time
for road in r.iterShapeRecords():
    # Shape geometry
    geom = road.shape
    # Database attributes 
    rec = road.record
    # Get the bounding box of the shape (a single road)
    sxmin, symin, sxmax, symax = geom.bbox
    # Compare it to our Puerto Rico bounding box.
    # go to the next road as soon as a coordinate is outside the box
    if sxmin <  xmin: continue
    elif sxmax > xmax: continue
    elif symin < ymin: continue
    elif symax > ymax: continue
    # Road is inside our selection box.
    # Add it to the new shapefile
    w._shapes.append(geom)
    w.records.append(rec)
    
# Save the new shapefile! (.shp, .shx, .dbf)
w.save("Puerto_Rico_Roads")

And here's our result!

Yes, there are dozens of tools and libraries that will do the same thing.  But this is the only solution in the entire (known) universe that is implemented in pure Python.  You don't need any compiled C code.  If you have the ability to run Python 2 or 3, you can run this without any additional system permissions.  It will run where most other libraries won't and when that's what your requirements are, nothing else will do!

No comments:

Post a Comment