Pages

Thursday, February 10, 2011

Merging Lots of Shapefiles (quickly)

Arne, over at GIS-Programming.com, recently posted about merging shapefiles using a batch process. I can't remember the last time I merged two or more shapefiles but after googling around it is a very common use case.  GIS forums are littered with requests for the best way to batch merge a directory full of files.  My best guess is people have to work with automatically-generated, geographically disperse data with a common projection and database schema.  I imagine these files would be the result of some automated sensor output. If you know some use cases requiring merging many shapefiles I'd be curious to hear about it.

Arne pointed out that all the code samples out there iterate through each feature in a shapefile and add them to the merged file.  He says this method is slow. I agree to an extent (no pun intended).  However, at some point the underlying shapefile library MUST iterate through each feature in order to generate the summary information, namely the bounding box, required to write a valid shapefile header.  But it is theoretically slightly more efficient to wait until the merge is finished so there is only one iteration cycle.  At the very least, waiting till the end requires less code.

The following example merges all the shapefiles in the current directory into one file and it is quite fast.

# Merge a bunch of shapefiles with attributes quickly!
import glob
import shapefile
files = glob.glob("*.shp")
w = shapefile.Writer()
for f in files:
  r = shapefile.Reader(f)
  w._shapes.extend(r.shapes())
  w.records.extend(r.records())
w.fields = list(r.fields)
w.save("merged")

4 comments:

gionata said...

To make the code you wrote works I had to modify few thing, because the writer need something to extend to use extend.

# Merge a bunch of shapefiles with attributes quickly!
import glob
import shapefile
files = glob.glob("*.shp")
w = shapefile.Writer()
r = shapefile.Reader()
w._shapes.append(shapefile.Reader(files[0]))
for f in files[1:]:
print f
r = shapefile.Reader(f)
w._shapes.extend(r.shapes())
w.records.extend(r.records())
w.fields = list(r.fields)
w.save("merged")

gionata said...

I hope this can help

Joel Lawhead, PMP said...

gionata,

That is strange... the writer initializes shapes as an empty array which is extendable:

>>> import shapefile
>>> w = shapefile.Writer()
>>> w._shapes()
[]
>>> w._shapes.extend([1,2,3])
>>> w._shapes()
[1,2,3]

What version of Python and what platform are you using? Could you post some sample code?

David said...

Very nice piece of code and very useful to merge a lot of tiled data.

I however had to update a little bit the shapefile.py (date: 20110927, version: 1.1.4) to process my pointZ shapefiles. On line 699, I've change s.points[0][2] by s.z[0] and on line 705 s.points[0][3] by s.m. Does it seems correct?

Thanks a lot,
David