Pages

Sunday, November 28, 2010

Introducing the Python Shapefile Library

Over Thanksgiving I finally got around to releasing the Python Shapefile Library. It is a single file of pure Python with no dependencies. It reads and writes shp, shx, and dbf for all 15 types of shapefiles in a pythonic way. You can find it with documentation here in the CheeseShop or search for "pyshp" on Google Code.

This library simply reads and writes shapefiles with no support for geometry calculations or the other eight or nine other supporting and undocumented shapefile formats including indexes and projection files which have been added since the specification was published in 1998.

Here's a basic example of writing a polygon shapefile:
import shapefile
w = shapefile.Writer(shapefile.POLYGON)
w.poly(parts=[[[1,5],[5,5],[5,1],[3,3],[1,1]]])
w.field('FIRST_FLD','C','40')
w.field('SECOND_FLD','C','40')
w.record('First','Polygon')
w.save('shapefiles/test/polygon')
There are plenty of other examples in the documentation.

The library consists of a Reader class, a Writer class, and an Editor class which simplifies making changes to an existing shapefile by giving you one object to work with so you don't have to juggle the Reader and the Writer objects yourself.

Beyond the docstring tests and some unit tests I tried PSL out in Jython with no issues. It's been awhile since I've run the tests. I want to try out Jython again as well as the other Python implementations which have a "struct" and some form of "os" module. I don't expect any issues with IronPython.

My company sells industrial-strength, native shapefile libraries for Java and Visual Basic which I was not involved in developing. I wrote this simple library to fully learn the shapefile specification for my own curiosity and to lead to some improvements in our commercial libraries. I learned quite a bit and we plan to release some very interesting features to our JShapefile and VBShapefile libraries in 2011 which will solve some major annoyances faced by developers who work with the shapefile format on a regular basis. More on that later...

PSL is not the only way to write shapefiles with Python however as far as I know it is the only complete pure Python library. Every other option is a Python wrapper around a C or C++ library (not that there's anything wrong with that) or partially-developed in Python only. I like having a pure Python, dependency-free, no-setup choice even if it's much slower than a highly-optimized, C-based module. Here's why:
  1. C-based modules can't follow your code everywhere - at least not easily (ex. Google App Engine and other web hosts, many embedded platforms, Python on different runtimes such as Jython and IronPython)
  2. Unless the developer really goes out of his or her way, C-based geospatial libraries wrapped in Python have kludgy-feeling methods and return opaque objects. There are notable exceptions to this rule but they are few and far between.
  3. Speed is the #1 reason developers cite as a reason to create C-based Python modules. In the geospatial domain the complexity of the data formats and spatial calculations makes wrapping libraries the easier choice. But most developers use Python because of the speed of development and ease of maintenance rather than program execution. In the rapidly-growing geospatial technology world new ideas are coming out every day. Rapid application development is key. The more easy-to-use, easy-to-change libraries the better.
Here are some other Python shapefile tools.

ShpUtils - Zack Johnson's pure-Python shapefile reader.

Shapelib - The original C-based shapefile library with Python bindings.

Pyshape - an alternative shapelib wrapper

OGR - General vector read/write library from shapelib creator Frank Warmerdam

Shapefile - a pure-Python read/write module under development

24 comments:

  1. Hello Joel,

    I just came across your library "shapefile.py" and it looks a great piece of work and exactly what I'm looking for. I did some geospatial development in perl and recently (very recently) started getting into python. As I'm new to python I thought to start with python 3 and not 2.xx. Sadly your library seems only to work with python 2.xx. Will there be a port to python 3?

    Thanks and regards

    Helmut

    ReplyDelete
  2. Helmut,

    Thanks for using the Python Shapefile Library! I am close to releasing the Python 3 port of the library.

    Even though the library is pure Python, version 3 of the language has some significant changes related to strings which is taking a while to update. Unfortunately most of the changes could not be fixed by the automated porting script.

    While the Python Shapefile Library is pure Python, most python geospatial libraries have bindings to C code. For this reason I would strongly recommend focusing your development on the latest version of Python 2 which includes a lot of the features of 3 but maintains backwards compatibility.

    It's going to take some time before the Python community ports the millions of lines of useful libraries available to Python 3 so it will be difficult to use it for everyday work.

    The version 3 port will probably be rough around the edges so I'll look forward to your feedback.

    Joel

    ReplyDelete
  3. Hello Joel,

    thanks for your answer. I'm still not sure wether to use python 2.x for it's not easy to get books on this version nowadays ... So maybe I wait until you released your python 3 version ...
    Could you please give a short note, when the new version is available?

    Thanks for your support and regards

    Helmut

    ReplyDelete
  4. Helmut,

    The hasty Python 3 port is now available at http://code.google.com/p/pyshp.

    It is not well tested and the "Editor" class is not functional yet but it is also not necessary to read and write shapefiles.

    ReplyDelete
  5. hi Joel,
    I've been using shapefile.py a bit and mostly love it. One question though: I'm trying to create a "null point shape" shapefile but no matter what I try it keeps crashing. One very simple 4-line example follows:

    import shapefile
    w = shapefile.Writer()
    w.null()
    w.save('c:\\temp\\new_shapefile')

    This results in the following error output:

    Traceback (most recent call last):
    File "c:\temp\create_shapefile_test1.py", line 4, in
    w.save('c:\\temp\\new_shapefile')
    File "c:\temp\shapefile.py", line 757, in save
    self.saveShp(target)
    File "c:\temp\shapefile.py", line 730, in saveShp
    self.__shapefileHeader(self.shp, headerType='shp')
    File "c:\temp\shapefile.py", line 493, in __shapefileHeader
    f.write(pack("<4d", *self.bbox()))
    File "c:\temp\shapefile.py", line 466, in bbox
    return self.__bbox(self._shapes)
    File "c:\temp\shapefile.py", line 424, in __bbox
    px, py = zip(*s.points)[:2]
    ValueError: need more than 0 values to unpack

    Any advice you have would be greatly appreciated. Thanks!

    Al

    ReplyDelete
  6. Alan,

    I'll get to the bottom of that error. Stand by....

    ReplyDelete
  7. Alan,

    I have posted a bug fix version in on "Downloads" section (not the repository) of the project site:
    http://code.google.com/p/pyshp/downloads/list

    Please test this version to verify the fix. I will make it an official version in the repository later today.

    The problem was I didn't check to see if the shapefile type was null before I tried to write the bounding box of the geometry when saving. This issue was an oversight from not working with null shapefiles enough.

    I created a null shapefile in a couple of versions of ArcMap to see what ESRI puts in these header values. They just use "0" and in the latest version they use [0,0,-1,-1]. So I used 0's.

    The null shapefiles written by the library opened successfully and behaved as expected in two different ESRI packages. I haven't seen null shapefiles even mentioned in any other shapefile package documentation so no telling how different developers address them.

    Thanks for exposing this issue. On a side note when fixing this problem it exposed another minor bug related to error handling which is now fixed.

    Let me know how it goes for you or if you find anything else.

    You can post here or get my direct e-mail address off of the google code project site in the owners list.

    Cheers,
    Joel

    ReplyDelete
  8. hi Joel,

    I downloaded the bug fix and tried it out. WORKS GREAT! Thanks much. Al

    ReplyDelete
  9. Alan,

    I reposted another version in the same place. The one you have will create an empty shapefile if you create a point shapefile. Other than that you won't notice any issues.

    ReplyDelete
  10. Hello Joel,

    your shapefile Library is pure genius compared to the clumsy solutions I came across until now. I use it to generate large (>60000 records) polyline shapefiles from excel tables (the performance is great!). Recently, I had to add z-values to the records, but i had no success in doing so. Can you give a minimal example for this use case?

    Thanks and regards,
    Christian

    ReplyDelete
  11. Christian,

    Looks like you've uncovered a bug. Standby...

    ReplyDelete
  12. Christian,

    Please download the latest version from the subversion repository (Revision 24, version 1.0). This update fixes the issue with polygons and z values as well as several other minor issues I found.

    I will post some z value examples in the wiki. For now make sure you explicitly set the shapefile type for each shape:

    w.poly([[[89,33,80],[91,22,82], [90,30,89]]], shapeType=15)

    Thanks for the positive feedback and let me know how it works.

    ReplyDelete
  13. Hi Joel,

    it works perfectly - Thanks a lot for your fast support!

    Best regards,
    Christian

    ReplyDelete
  14. Christian - that's great to hear and thanks again for the great feedback.

    - Joel

    ReplyDelete
  15. Hello Joel,
    thanks for this great job!
    Igor

    ReplyDelete
  16. brilliant! works like a charm. now i don't need to depend on arcpy, which is incredibly slow anyway. Thank you!

    ReplyDelete
  17. Hey Joel,

    I am a complete newbie when it comes to shape files.

    What I want to do is look up congressional districts. Shape files are here:

    http://www.srrproject.org/resources/redistricting-shapefiles/

    I have lots of latitude/longitude pairs. For each pair, I want to determine the congressional district.

    Does your code do that? If not, where should I look?

    Thanks,

    Rick

    ReplyDelete
    Replies
    1. Rick,

      What you want is to check and see if a point (lat/lon pair) is inside a polygon (congressional districts).

      Here is my blog post to do just that:
      http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html

      Delete
  18. Joel,

    Excellent! This looks doable -- I will give it a try. (Caveat: questions may come up.)

    Thanks,

    Rick

    ReplyDelete
  19. I have a polyon shapefile with attributes as below:

    Shape,County,Area,Population,UID

    Using
    sf = shapefile.Reader("shapefiles/counties.shp") and sf.shapes,i am able to print polygon coordinates.But,how can I get something like below in a csv file:

    UID represents an unique identifier for each polygon.

    UID#1,County,Area,population,Point#1
    UID#1,County,Area,population,Point#2
    UID#1,County,Area,population,Point#3
    UID#1,County,Area,population,Point#N

    UID#2,County,Area,population,Point#1
    UID#2,County,Area,population,Point#2
    UID#2,County,Area,population,Point#3
    UID#2,County,Area,population,Point#N

    ReplyDelete
  20. Dear Joel,

    Thanks for this post. I have a question that I want to ask you.

    Using shapefile library, I can read the shapefile. Now I want to create a shapefile using exact information of the original shapefile. For example, I have a administrative map of Switzerland in shapefile (with 200 records), and I want to to create another shapefile, looking exactly the same as the original one with 200 recordes and perhapr create another column (new attribute).

    How can I do that using the shapefile library.

    Thanks!

    Nam

    ReplyDelete
  21. Does this have the ability to add or calculate lat/long for a set of points? (ArcGIS's Add XY Coor tool)

    ReplyDelete
    Replies
    1. You can create a point shapefile or append to one. Is that what you mean? What is the "calculation" performed by ArcGIS? Here's an example:http://stackoverflow.com/questions/13406282/where-do-i-append-point-values-with-pyshp

      Delete
  22. Hi,
    Is it possible to get latitude and longitude boundary from a shape file

    ReplyDelete