Wednesday, June 26, 2013

Dots in Shapefile Names

I came across a post from another blogger recently which highlighted an issue I hadn't considered before.  Most operating systems allow for an arbitrary number of periods "." in a file name.  For example your shp file in a shapefile set might be named "cities.2013.v.1.2.shp".  While I'm aware of that fact I did not really take it into consideration when designing PyShp.  Here are some factors I did consider:
1. There are at least 3 and up to 9 file types in a single shapefile data set.
2. Dbf files are a fairly common data format for file-based databases.
3. The shx file is just an index and does not contain critical data.

Given these considerations I made it possible to just specify the the base name of a shapefile when reading or writing or you can add an extension IF you want.  Pyshp doesn't rely on the extension.  So in the above example I could just use "cities.2013.v.1.2" in theory.  

That way if one of the files was missing or for some reason you had no control over how the file name was passed to your program the software would just work.  I figured this approach would be the most robust and intuitive, user-friendly method.  In fact if you DO specify a file extension PyShp just chops is off to get the base name using Python's os.path.splitext() module method and then rebuilds the requisite file names before checking to see if they even exist.

There's only one problem.  If you use the base file name AND you have extra dots in your file name as in the example, Python still chops off everything after the last dot assuming it is the file extension.  This functionality is fair since there's no way programatically to be sure what is an extension and what is not.  You could use a look-up table in this context but that can lead to other problems.  So in the above example if you pass the base file name to the PyShp Reader object the base name would be shortened to "cities.2013.v.1" which would result in IO exceptions.

The safe workaround is to just always include a three-letter extension in the file name if you might have dots. I may eventually try to program around this but it's kind of a corner case with a reasonable workaround.  

No comments:

Post a Comment