Pages

Tuesday, April 9, 2013

Add a Field to an Existing Shapefile

The dbf file of a shapefile is a simple file-based database with rows and columns.  The rows are
Adding a field where there wasn't one before has
limitless possibilities.
"records" and the columns are "fields".  Sometimes you want to add an additional field to the dbf file to capture some new type of information not originally included.

Today's example shows you how to use pyshp to add a new field to an existing shapefile.  This operation is a two-step process.  You must first update the dbf header to define the new field.  Then you must update each record to account for a new column in the database so everything is balanced.

In the past, I've demonstrated modifying existing shapefiles for other reasons including merging shapefiles and deleting features in shapefiles.  In every case you are actually reading in the existing shapefile, creating a new shapefile in memory and then writing out the new file either separately or on top of the old one.  Even in really high-end GIS packages that's basically all you're doing.  Some packages will use a temporary file in between. 

Here's the example.  We'll create a counter that gives us unique sample data to append to each record just so we can see the changes clearly.  In the real world, you'd probably just insert a blank palce holder.

import shapefile

# Read in our existing shapefile
r = shapefile.Reader("Mississippi")

# Create a new shapefile in memory
w = shapefile.Writer()

# Copy over the existing fields
w.fields = list(r.fields)

# Add our new field using the pyshp API
w.field("KINSELLA", "C", "40")

# We'll create a counter in this example
# to give us sample data to add to the records
# so we know the field is working correctly.
i=1

# Loop through each record, add a column.  We'll
# insert our sample data but you could also just
# insert a blank string or NULL DATA number
# as a place holder
for rec in r.records():
 rec.append(i)
 i+=1
 # Add the modified record to the new shapefile 
 w.records.append(rec)

# Copy over the geometry without any changes
w._shapes.extend(r.shapes())

# Save as a new shapefile (or write over the old one)
w.save("Miss") 

So there you have it. Overall it's a pretty simple process that can be extended to do some sophisticated operations.  The sample Mississippi shapefile can be found here.  But this shapefile only has one record so it's not that interesting.  But it's lightweight and easy to examine the dbf file in your favorite spreadsheet program.

Tuesday, October 16, 2012

Back Online!

I had some technical difficulties with my domain renewal.  Sorry for the inconvenience.

Wednesday, June 13, 2012

Hiring...

If you're a geospatial programmer and interested in living in the greater New Orleans area, and working with a great team on challenging projects for federal agencies you see in blockbuster movies, talk to me.

Tuesday, May 29, 2012

SBN Mystery - Solved!

Last October I asked you all for help in figuring out the spatial indexing algorithm used to create Esri sbn files.  Using Pyshp, I had successfully decoded the file formats which I provided. However I could not figure out the algorithm used to create and populate the spatial bins within these files.

Today I'm pleased to announce this challenge has been answered.  The GIS community now has access to both the sbn and sbx file format as well as the algorithm for grouping features in a shapefile into "spatial bins".

I'm glad I asked for help as this challenge turned out to be quite difficult. The brain behind this operation is Marc Pfister with some good insights from Si Parker.  Marc worked tirelessly on this problem for months with a cross-country move and complete career change thrown in to make it interesting.  Marc did all the heavy intellectual lifting with me playing an inquisitive, but usually short-sighted Watson to his Holmes.  I generated endless series of shapefiles and one-off scripts to help Marc try and flush out the algorithm based only on subtle changes in the number of bins and features they contained as well as his past experience with spatial indexing.

When I figured out the file formats I had hoped I was just a Wikipedia search away from recognizing the spatial tree algorithm.  But the solution turned out to be much more complex than that.  Esri uses a sort of balanced tree that exhibits traits of several different algorithms.  The system seems carefully designed but is by no means obvious.  I will publish Marc's findings as soon as I can.

There are still a few shapefile cases which create puzzling but insignificant results. However we are at the 98% mark. The project goal of compatibility has been reached.  There is no longer any reason to hold off on sharing the results.  We are fairly certain that we are able to create sbn and sbx files which sufficiently fool ArcMap as well as other Esri packages so other software can read, use, and generate these indexes alongside the Esri suite.  There is more testing to do but it seems we are out of the woods.

What we haven't done is nicely packaged all of this work up.  But Marc posted a small set of Python scripts on github which demonstrate the algorithm and file handling needed to copy this capability.  Over the coming months I will fold this code into Pyshp, produce better documentation on the algorithm, and provide posts on how to deal with these indexes. But for now here's what you've been waiting for:

https://github.com/drwelby/hasbeen

By the way, Marc does freelance programming.  In my job, I get the opportunity to work with lots of really bright geospatial programmers and mathematicians and this guy is one of the very best I've ever seen.  If you have a tough geospatial project and need some E=MC2 smarts definitely look him up.

Monday, May 21, 2012

Advanced Shapefile Merger

Italian GIS blogger Toni sent me a message about a sophisticated OGR-based shapefile merger utility he created.  Last year I posted a simple pyshp example that would find all ".shp" files in a directory and merge the geometry and attributes into a single shapefile.  Toni's version takes this concept much further to include wildcards, recursive directory walking, exclusion lists, and some dbf tricks.  You can find this utility at Toni's blog "Furious GIS":

http://furiousgis.blogspot.it/2012/05/python-shapefile-merger-utility.html

Tuesday, May 1, 2012

Pyshp 1.1.6-beta Release for Testing

Pyshp 1.1.6-beta ready for testing
A pre-release of pyshp 1.1.6 is available for testing.  This release addresses some major issues with reading/writing 3D shapefiles.  The issue was identified by John Burky.  I am currently working through several bug reports right now but this one was a show stopper so I wanted to get a fix out quickly as there seem to be several people working with z elevation values right now.  Also if you were having troubles with "m" measure values this release fixes a related issue.  The Editor class, z values, and m values are dark corners that are not as well tested as "regular" shapefile features so if you're working with these types of data keep a sharp eye out for anything weird.  I'll push this update out as an official release within the next couple of weeks if there are no complaints.

The release is available in the Pyshp Google Code site "Downloads" section here.

In other news we are still working on the sbn/sbx binning example for spatial indexes.  Very close but not there yet.

Tuesday, February 28, 2012

Pyshp shapeRecords() Method

The shapefile.Reader.shapeRecords()
method lets you juggle both the
geometry and dbf attributes at the
same time.
The shapefile.Reader.shapeRecords() method is a simple convenience method which allows you to simultaneously loop through both the geometry and records of a shapefile.  Normally  you would loop through the shape records and then loop through the dbf records seperately.  But sometimes it's easier to have both sides of the shapefile equation accessible at the same time.  This ability is important sometimes because the link between geometry and attributes is implied by their order in the file and not explicit which can make referencing one side or the other a pain.  Warning: the current implementation pulls everything into memory at once which can be a problem for very large shapefiles. This weakness will be updated in future versions.

Here’s a simple usage example followed by a detailed explanation and a few other posts where I use this method without much explanation.

Let’s say you have a simple point-location address shapefile named “addr.shp” with the following structure:

GEOMETRY ADDRESS CITY STATE ZIP
[-89.522996, 34.363596] 7018 South 8th Oxford MS 38655
[-89.520695, 34.360863] 1199 South 11th Oxford MS 38655
[-89.520927, 34.362924] 8005 Fillmore Ave Oxford MS 38655

You could then use the shapeRecords method like this:

>>> import shapefile
>>> r = shapefile.Reader(“addr”)
>>> sr = r.shapeRecords()
>>> # get the first shaperecord
>>> sr_test = sr[0]
>>> # Look at the geometry of the shape
>>> sr_test.shape.points
[[-89.522996, 34.363596]]
>>> # Look at the attributes of the dbf record
>>> sr_test.record
[‘7018 South 8’,’Oxford’,’MS’,’38655’]
>>> # Now let’s iterate through all of them
>>> for sr in r.shapeRecords():
...    print “x: “, sr.points[0][0]
...    print “y: “, sr.points[0][1]
...    # Output just the address field
...    print “Address: “, sr.record[0]
x: -89.522996
y: 34.363596
Address: 7018 South 8th
x: -89.520695
y: 34.360863
Address: 1195 South 11th
x: -89.520927
y: 34.362924
Address: 805 Fillmore Ave

Here’s how it works.

The shapeRecords() method returns a list.

Each entry in that list is a _ShapeRecord object instance.

A _ShapeRecord object has two attributes: shape, record

_ShapeRecord.record contains a simple list of the attributes.

_ShapeRecord.shape contains a _Shape object instance.

A _Shape object has, at a minimum, two attributes: shapeType, points

If the _Shape instance contains a polygon a “parts” attribute will appear.  This attribute contains the index in the point list of the beginning of a “part”.  Parts let you store multiple shapes in a single record.

The shapeType attribute provides a number telling you if the shapefile is  a point, polygon, line, etc. file.  These constants are listed in the shapefile spec document as well as near the top of the source code.

The points is just a list containing lists of the point coordinates.  Two things to note:  If the geometry has multiple parts, such as multiple polygons, the points for all parts are just lumped together.  You must separate them by referencing the parts index list.  Some shape types allow for z and m values which may appear in addition to the x,y pairs.

This method is really just a clumsy convenience method that basically zips up the results of the shapes() and records() methods you are already using.

I have a few blog posts where I call this method as well:

http://geospatialpython.com/2011/02/changing-shapefiles-type.html

http://geospatialpython.com/2011/01/point-in-polygon.html

http://geospatialpython.com/2010/12/dot-density-maps-with-python-and-ogr.html  (in the comments)