Pages

Monday, September 26, 2011

Reading Shapefiles from the Cloud

In a previous post, I wrote about saving shapefiles using pyshp to file-like objects and demonstrated how to save a shapefile to a zip file. PyShp has the ability to read from Python file-like objects including zip files as well (as of version 1.1.2).  Both the Reader object and the Writer.save() method accept keyword arguments which can be file-like objects allowing you to read and write shapefiles without any disk activity.

In this post, we'll read a shapefile directly from a zip file on a server all in memory.

Normally to read a shapefile from the file system you just pass in the name of the file to the Reader object as a string:

import shapefile
r = shapefile.Reader("myshapefile")

But if you use the keywords shp, shx, and dbf, then you can specify file-like objects.  This example will demonstrate reading a shapefile - from a zip file - on a website.

import urllib2
import zipfile
import StringIO
import shapefile

cloudshape = urllib2.urlopen("http://pyshp.googlecode.com/files/GIS_CensusTract.zip")
memoryshape = StringIO.StringIO(cloudshape.read())
zipshape = zipfile.ZipFile(memoryshape)
shpname, shxname, dbfname, prjname = zipshape.namelist()
cloudshp = StringIO.StringIO(zipshape.read(shpname))
cloudshx = StringIO.StringIO(zipshape.read(shxname))
clouddbf = StringIO.StringIO(zipshape.read(dbfname))
r = shapefile.Reader(shp=cloudshp, shx=cloudshx, dbf=clouddbf)
r.bbox
[-89.8744162216216, 30.161122135135138, -89.1383837783784, 30.661213864864862]

You may specify only one of the three file types if you are just trying to read one of the file types. Some attributes such as Reader.shapeName will not be available using this method.

File-like objects provide a lot of power. However it is important to note that not all file-like objects implement all of the file methods. In the above example the urllib2 module does not provide the "seek" method needed by the zipfile module. The ZipFile read() method is the same way.  To get around that issue, we transfer the data to the StringIO or cStringIO module in memory to ensure compatibility. If the data is potentially too big to hold in memory you can use the tempfile module to temporarily store the shapefile data on disk.

No comments:

Post a Comment