Pages

Thursday, February 10, 2011

Merging Lots of Shapefiles (quickly)

Arne, over at GIS-Programming.com, recently posted about merging shapefiles using a batch process. I can't remember the last time I merged two or more shapefiles but after googling around it is a very common use case.  GIS forums are littered with requests for the best way to batch merge a directory full of files.  My best guess is people have to work with automatically-generated, geographically disperse data with a common projection and database schema.  I imagine these files would be the result of some automated sensor output. If you know some use cases requiring merging many shapefiles I'd be curious to hear about it.

Arne pointed out that all the code samples out there iterate through each feature in a shapefile and add them to the merged file.  He says this method is slow. I agree to an extent (no pun intended).  However, at some point the underlying shapefile library MUST iterate through each feature in order to generate the summary information, namely the bounding box, required to write a valid shapefile header.  But it is theoretically slightly more efficient to wait until the merge is finished so there is only one iteration cycle.  At the very least, waiting till the end requires less code.

The following example merges all the shapefiles in the current directory into one file and it is quite fast.

# Merge a bunch of shapefiles with attributes quickly!
import glob
import shapefile
files = glob.glob("*.shp")
w = shapefile.Writer()
for f in files:
  r = shapefile.Reader(f)
  w._shapes.extend(r.shapes())
  w.records.extend(r.records())
w.fields = list(r.fields)
w.save("merged")

17 comments:

  1. To make the code you wrote works I had to modify few thing, because the writer need something to extend to use extend.

    # Merge a bunch of shapefiles with attributes quickly!
    import glob
    import shapefile
    files = glob.glob("*.shp")
    w = shapefile.Writer()
    r = shapefile.Reader()
    w._shapes.append(shapefile.Reader(files[0]))
    for f in files[1:]:
    print f
    r = shapefile.Reader(f)
    w._shapes.extend(r.shapes())
    w.records.extend(r.records())
    w.fields = list(r.fields)
    w.save("merged")

    ReplyDelete
  2. gionata,

    That is strange... the writer initializes shapes as an empty array which is extendable:

    >>> import shapefile
    >>> w = shapefile.Writer()
    >>> w._shapes()
    []
    >>> w._shapes.extend([1,2,3])
    >>> w._shapes()
    [1,2,3]

    What version of Python and what platform are you using? Could you post some sample code?

    ReplyDelete
  3. Very nice piece of code and very useful to merge a lot of tiled data.

    I however had to update a little bit the shapefile.py (date: 20110927, version: 1.1.4) to process my pointZ shapefiles. On line 699, I've change s.points[0][2] by s.z[0] and on line 705 s.points[0][3] by s.m. Does it seems correct?

    Thanks a lot,
    David

    ReplyDelete
  4. hi, I wrote something with the same purpose but I little bit more sophisticated:
    http://furiousgis.blogspot.it/2012/05/python-shapefile-merger-utility.html

    any effort would be appreciated,
    thx

    ReplyDelete
  5. Wonderful work Joel

    Thank you

    ReplyDelete
  6. Hope this gets to you Joel. I like your code for merging and it works fine for me on up to 6 files however if I exceed that I get the error message below. Any ideas? (Note I am only trying to merge 20 shp files of less than 2MB total)

    Traceback (most recent call last):
    File "C:\workspace_scratch\final data\shapemerger_01.py", line 21, in
    w.save("merged")
    File "C:\Python27\ArcGIS10.1\Lib\site-packages\shapefile.py", line 1032, in save
    self.saveDbf(target)
    File "C:\Python27\ArcGIS10.1\Lib\site-packages\shapefile.py", line 1004, in saveDbf
    self.__dbfRecords()
    File "C:\Python27\ArcGIS10.1\Lib\site-packages\shapefile.py", line 891, in __dbfRecords
    assert len(value) == size
    AssertionError


    Cheers, Peter

    ReplyDelete
    Replies
    1. Peter,

      I haven't been able to recreate the problem yet. Are you using the latest version (1.2.0)? If you can make the shapefiles available for download I can see what's going on. I just merged 298 shapefiles into a single point shapefile. You can email me: jlawhead geospatialpython.com

      Delete
    2. Hmmm, 1.2.0 ?? I am using the code from above.

      '''
      # Merge a bunch of shapefiles with attributes quickly!
      import glob
      import shapefile
      files = glob.glob("*.shp")
      w = shapefile.Writer()
      for f in files:
      r = shapefile.Reader(f)
      w._shapes.extend(r.shapes())
      w.records.extend(r.records())
      w.fields = list(r.fields)
      w.save("merged")
      '''

      Delete
    3. Peter,

      The error is related to the dbf files. That part of the library isn't as robust as it could be. Try changing the w.save("merged") line to w.saveShp("merged"). That will save just the shp file and hopefully avoid dealing with the dbf to see if we can isolate the problem. If you don't get an error change that line to w.saveShx("merged") and see if that works. Finally try w.saveDbf("merged") to see if something in the component dbf files are causing the issue.

      Delete
  7. Replies
    1. Meysam - please see my comment above to Peter.

      Delete
  8. Hi,

    I have a similar error with that code.I use python 2.7.6. in Win7 64bits.

    Traceback (most recent call last):
    File "........MERGE.py", line 24, in
    w.save("merged")
    File "C:\Python25\lib\shapefile.py", line 1028, in save
    self.saveShp(target)
    File "C:\Python25\lib\shapefile.py", line 985, in saveShp
    self.__shapefileHeader(self.shp, headerType='shp')
    File "C:\Python25\lib\shapefile.py", line 699, in __shapefileHeader
    f.write(pack(">i", self.__shpFileLength()))
    File "C:\Python25\lib\shapefile.py", line 607, in __shpFileLength
    size += nParts * 4
    UnboundLocalError: local variable 'nParts' referenced before assignment

    my programming skills are very basic. Someone can help me?

    thank you very much

    ReplyDelete
  9. solved, do not know how, but it works

    ReplyDelete
  10. Thanks for the great pySHP library! However, I'm currently facing an error, I hope you can help?

    Traceback (most recent call last):
    File "/Users/daniel/Projects/shapefileimport/app.py", line 55, in
    hectopunten_output_writer.saveShp("Wegvakken")
    File "/Users/daniel/.virtualenvs/shapefileimport/lib/python2.7/site-packages/shapefile.py", line 995, in saveShp
    self.__shapefileHeader(self.shp, headerType='shp')
    File "/Users/daniel/.virtualenvs/shapefileimport/lib/python2.7/site-packages/shapefile.py", line 709, in __shapefileHeader
    f.write(pack(">i", self.__shpFileLength()))
    File "/Users/daniel/.virtualenvs/shapefileimport/lib/python2.7/site-packages/shapefile.py", line 617, in __shpFileLength
    size += nParts * 4
    UnboundLocalError: local variable 'nParts' referenced before assignment

    ReplyDelete
  11. I have 200 cities shapefile for three different time period, means 1 city has 3 shapefile. I want to merge city wise (Total 600 shapefiles into 200). Is it possible in python or within ArcGIS (script or model)

    ReplyDelete
  12. Hi Joel,

    I just tried this with V1.2.3 on python 2.7.8 and got the following error:

    Traceback (most recent call last):
    File "mergeShapefiles.py", line 16, in
    w.records.extend(r.records())
    File "/usr/lib/python2.7/site-packages/shapefile.py", line 539, in records
    self.__dbfHeader()
    File "/usr/lib/python2.7/site-packages/shapefile.py", line 457, in __dbfHeader
    fieldDesc = list(unpack("<11sc4xBB14x", dbf.read(32)))
    struct.error: unpack requires a string argument of length 32

    Cheers, Peter

    ReplyDelete