Pages

Friday, June 27, 2014

Merging Shapefiles with PyShp and Dbfpy

A question on GIS StackExchange prompted me to write a quick script as an update to my previous merging shapefiles with pyshp. The SE question was the result of a typo but I thought this script was still useful based on other questions I've received regarding dbf files and pyshp.
post on

The shp and shx portions of pyshp are complete in that they can read and write all shapefiles allowed by the shapefile spec created by Esri.  The dbf format is included in the Esri shapefile spec by reference.  The dbf format is decades old and has several different versions.  Most people reference the generic XBase version of the spec.

Since the dbf format was already established, I grabbed the easiest code sample I could find written by Raymond Hettinger that is a reader and writer in pure Python.  This recipe is very useful and works 85% of the time with most shapefiles.  Many people run into issues though with dbf files produced by other software.

I have found that the pure-python, public-domain dbfpy module is far more robust than the simple dbf engine in pyshp.  So when people run into troubles with dbf files, I usually suggest using pyshp to read/write the shp and shx files and then dbfpy to handle the dbf file.  PyShp let's you work with each shapefile type independently to make this approach possible. 

The following code sample demonstrates merging all shapefiles in a directory into one shapefile called "merged".  Of course the shapefiles must all have the same geometry type, spatial reference system, and dbf schema.  The shp files are read by pyshp, and then the merged shp and shx file are created by pyshp.  Then the same is done for the dbf files separately by dbfpy.


import glob
import shapefile
from dbfpy import dbf
shp_files = glob.glob("*.shp")
w = shapefile.Writer()
# Loop through ONLY the shp files and copy their shapes
# to a writer object. We avoid opening the dbf files
# to prevent any field-parsing errors.
for f in shp_files:
    print "Shp: %s" % f
    shpf = open(f, "rb")
    r = shapefile.Reader(shp=shpf)
    w._shapes.extend(r.shapes())
    print "Num. shapes: %s" % len(w._shapes)
    shpf.close()
# Save only the shp and shx index file to the new
# merged shapefile.
w.saveShp("merged.shp")
w.saveShx("merged.shx")
# Now we come back with dbfpy and merge the dbf files
dbf_files = glob.glob("*.dbf")
# Use the first dbf file as a template
template = dbf_files.pop(0)
merged_dbf_name = "merged.dbf"
# Copy the entire template dbf file to the merged file
merged_dbf = open(merged_dbf_name, "wb")
temp = open(template, "rb")
merged_dbf.write(temp.read())
merged_dbf.close()
temp.close()
# Now read each record from teh remaining dbf files
# and use the contents to create a new record in
# the merged dbf file. 
db = dbf.Dbf(merged_dbf_name)
for f in dbf_files:
    print "Dbf: %s" % f
    dba = dbf.Dbf(f)
    for rec in dba:
        db_rec = db.newRecord()
        for k,v in rec.asDict().items():
            db_rec[k] = v
        db_rec.store()
db.close()