Pages

Wednesday, June 26, 2013

Dots in Shapefile Names

I came across a post from another blogger recently which highlighted an issue I hadn't considered before.  Most operating systems allow for an arbitrary number of periods "." in a file name.  For example your shp file in a shapefile set might be named "cities.2013.v.1.2.shp".  While I'm aware of that fact I did not really take it into consideration when designing PyShp.  Here are some factors I did consider:
1. There are at least 3 and up to 9 file types in a single shapefile data set.
2. Dbf files are a fairly common data format for file-based databases.
3. The shx file is just an index and does not contain critical data.

Given these considerations I made it possible to just specify the the base name of a shapefile when reading or writing or you can add an extension IF you want.  Pyshp doesn't rely on the extension.  So in the above example I could just use "cities.2013.v.1.2" in theory.  

That way if one of the files was missing or for some reason you had no control over how the file name was passed to your program the software would just work.  I figured this approach would be the most robust and intuitive, user-friendly method.  In fact if you DO specify a file extension PyShp just chops is off to get the base name using Python's os.path.splitext() module method and then rebuilds the requisite file names before checking to see if they even exist.

There's only one problem.  If you use the base file name AND you have extra dots in your file name as in the example, Python still chops off everything after the last dot assuming it is the file extension.  This functionality is fair since there's no way programatically to be sure what is an extension and what is not.  You could use a look-up table in this context but that can lead to other problems.  So in the above example if you pass the base file name to the PyShp Reader object the base name would be shortened to "cities.2013.v.1" which would result in IO exceptions.

The safe workaround is to just always include a three-letter extension in the file name if you might have dots. I may eventually try to program around this but it's kind of a corner case with a reasonable workaround.  

Sunday, June 23, 2013

PyShp Version 1.1.7 Release

PyShp 1.1.7 is out after several months of virtually no updates.  This release fixes a bunch of minor issues
plus a couple of important features.  You can get it through setuptools or source from the CheeseShop: https://pypi.python.org/pypi/pyshp/1.1.7.  The Google Code page is here:https://code.google.com/p/pyshp/

And as usual there are no dependencies other than Python itself.  Updates include:
  • Added Python geo_interface convention to export shapefiles as GeoJSON.
  • Used is_string() method to detect file names passed as unicode strings (failed on unicode strings before).
  • Added Reader.iterShapes() method to iterate through geometry records for parsing large files efficiently.
  • Added Reader.iterRecords() method to iterate through dbf records efficiently in large files.
  • Modified shape() method to use iterShapes() if shx file is not available as well as record() method.
  • Fixed bug which prevents writing the number 0 to dbf fields.
  • Updated shape() method to calculate and seek the start of the next record. The shapefile spec does not require the content of a geometry record to be as long as the content length defined in the header. The result is you can delete features without modifying the record header allowing for empty space in records.
  • Added enforcement of closed polygons in the Writer.poly() method.

  • Added unique file name generator to use if no file names are passed to a writer instance when saving (ex. w.save()). The unique file name is returned as a string.
  • Updated "bbox" property documentation to match Esri specification.
The __geo_interface__ update required a polygon area calculator.  This method is undocumented but you can feed a list of points representing a polygon to shapefile.signed_area(coords) and get an area calculation back. If the area is a positive number the points are clockwise (outer ring).  If the area is negative then the points are in counter-clockwise order (i.e. an inner polygon ring).