Friday, June 27, 2014

Merging Shapefiles with PyShp and Dbfpy

A question on GIS StackExchange prompted me to write a quick script as an update to my previous merging shapefiles with pyshp. The SE question was the result of a typo but I thought this script was still useful based on other questions I've received regarding dbf files and pyshp.
post on

The shp and shx portions of pyshp are complete in that they can read and write all shapefiles allowed by the shapefile spec created by Esri.  The dbf format is included in the Esri shapefile spec by reference.  The dbf format is decades old and has several different versions.  Most people reference the generic XBase version of the spec.

Since the dbf format was already established, I grabbed the easiest code sample I could find written by Raymond Hettinger that is a reader and writer in pure Python.  This recipe is very useful and works 85% of the time with most shapefiles.  Many people run into issues though with dbf files produced by other software.

I have found that the pure-python, public-domain dbfpy module is far more robust than the simple dbf engine in pyshp.  So when people run into troubles with dbf files, I usually suggest using pyshp to read/write the shp and shx files and then dbfpy to handle the dbf file.  PyShp let's you work with each shapefile type independently to make this approach possible. 

The following code sample demonstrates merging all shapefiles in a directory into one shapefile called "merged".  Of course the shapefiles must all have the same geometry type, spatial reference system, and dbf schema.  The shp files are read by pyshp, and then the merged shp and shx file are created by pyshp.  Then the same is done for the dbf files separately by dbfpy.

import glob
import shapefile
from dbfpy import dbf
shp_files = glob.glob("*.shp")
w = shapefile.Writer()
# Loop through ONLY the shp files and copy their shapes
# to a writer object. We avoid opening the dbf files
# to prevent any field-parsing errors.
for f in shp_files:
    print "Shp: %s" % f
    shpf = open(f, "rb")
    r = shapefile.Reader(shp=shpf)
    print "Num. shapes: %s" % len(w._shapes)
# Save only the shp and shx index file to the new
# merged shapefile.
# Now we come back with dbfpy and merge the dbf files
dbf_files = glob.glob("*.dbf")
# Use the first dbf file as a template
template = dbf_files.pop(0)
merged_dbf_name = "merged.dbf"
# Copy the entire template dbf file to the merged file
merged_dbf = open(merged_dbf_name, "wb")
temp = open(template, "rb")
# Now read each record from teh remaining dbf files
# and use the contents to create a new record in
# the merged dbf file. 
db = dbf.Dbf(merged_dbf_name)
for f in dbf_files:
    print "Dbf: %s" % f
    dba = dbf.Dbf(f)
    for rec in dba:
        db_rec = db.newRecord()
        for k,v in rec.asDict().items():
            db_rec[k] = v

Wednesday, March 26, 2014

BOGO Offer to Celebrate 2000th Book Title from Packt

Packt Publishing is celebrating their 2000th book title, "Learning Dart" of all things, with a buy-one-
get-one offer on all 2000 books. You can access the limited-time offer on Packt's site here:

All 2000 books are available in print or multiple eBook flavors.  Don't forget to check your eReader email account to see if you got an eBook credit from the multi-million anti-trust settlement against the 5 major book publishers for price-fixing eBooks.

The following list of Amazon links covers all of the geospatial books in Packt's library which is the largest collection by any one publisher as far as I know. 

Note these links are from my Amazon store.  To take advantage of the Packt offer you must got to Packt's site.

I've also included a couple of titles that have significant geospatial chapters even if their title doesn't say it explicitly.
  1. Learning Geospatial Analysis with Python
  2. Python Geospatial Development
  3. Programming ArcGIS 10.1 Cookbook
  4. Applying and Extending Oracle Spatial
  5. Learning Bing Maps API
  6. GeoServer Beginner’s Guide 
  7. Google Maps JavaScript API Cookbook
  8. Python Data Visualization Cookbook
  9. OpenStreetMap
  10. PostGIS Cookbook  
  11. OpenLayers 2.10 Beginner's Guide
  12. Learning QGIS 2.0
If I missed any titles please post them in the comments.

Monday, December 16, 2013

Python and Elevation Data: Creating a Shaded Relief

Editor's Note: This multi-part series is based on Chapter 7 of "Learning Geospatial Analysis with Python" and republished with permission from Packt Publishing. This series covers using Python to work with elevation data. The previous post discusses working with ASCII Grid files in Python.  

Shaded relief maps color elevation in a way that it looks as if the terrain is cast in a low-angle light, which creates bright spots and shadows. The aesthetic styling creates an almost photographic illusion, which is easy to grasp to understand the variation in terrain. It is important to note that this style is truly an illusion as the light is often physically inaccurate and the elevation is usually exaggerated to increase contrast.

In this post, we'll use the ASCII DEM referenced in the previous post to create another grid, which represents a shaded relief version of the terrain in NumPy.  You can download this DEM here:

If we color the DEM to visualize the source data we get the following image which we used in the previous post:

This terrain is quite dynamic so we won't need to exaggerate the elevation; however, the script has a variable called z, which can be increased from 1.0 to scale the elevation up.

After we define all the variables including input and output file names, you'll see the header parser based on the linecache module, which also uses a python list comprehension to loop and parse the lines that are then split from a list into the six variables. We also create a y cell size called ycell, which is just the inverse of the cell size. If we don't do this, the resulting grid will be transposed.

Note we define file names for slope and aspect grids, which are two intermediate products that are combined to create the final product. These intermediate grids are output as well, just to take a look at the process. They can also serve as inputs to other types of products.

This script uses a three by three windowing method to scan the image and smooth out the center value in these mini grids. But because we are using NumPy, we can process the entire array at once, as opposed to a lengthy series of nested loops. This technique is based on the excellent work of a developer called Michal Migurski, who implemented the clever NumPy version of Matthew Perry's C++ implementation, which served as the basis for the DEM tools in the GDAL suite.
After the slope and aspect are calculated, they are used to output the shaded relief. Finally, everything is saved to disk from NumPy. In the savetxt() method we specify a 4 integer format string, as the peak elevations are several thousand feet:

from linecache import getline
import numpy as np

# File name of ASCII digital elevation model
source = "dem.asc"
# File name of the slope grid
slopegrid = "slope.asc"
# File name of the aspect grid
aspectgrid = "aspect.asc"
# Output file name for shaded relief
shadegrid = "relief.asc"
## Shaded elevation parameters
# Sun direction
# Sun angle
# Elevation exageration
# Resolution
# No data value for output
NODATA = -9999

# Needed for numpy conversions
deg2rad = 3.141592653589793 / 180.0
rad2deg = 180.0 / 3.141592653589793

# Parse the header using a loop and
# the built-in linecache module
hdr = [getline(source, i) for i in range(1,7)]
values = [float(h.split(" ")[-1].strip()) \
 for h in hdr]
cols,rows,lx,ly,cell,nd = values
xres = cell
yres = cell * -1

# Load the dem into a numpy array
arr = np.loadtxt(source, skiprows=6)

# Exclude 2 pixels around the edges which are usually NODATA.
# Also set up structure for a 3x3 window to process the slope
# throughout the grid
window = []
for row in range(3):
    for col in range(3):
        window.append(arr[row:(row + arr.shape[0] - 2), \
        col:(col + arr.shape[1] - 2)])

# Process each cell
x = ((z * window[0] + z * window[3] + z * \
 window[3] + z * window[6]) - \
  (z * window[2] + z * window[5] + z * \
 window[5] + z * window[8])) / (8.0 * xres * scale);

y = ((z * window[6] + z * window[7] + z * window[7] + \
  z * window[8]) - (z * window[0] + z * window[1] + \
  z * window[1] + z * window[2])) / (8.0 * yres * scale);

# Calculate slope  
slope = 90.0 - np.arctan(np.sqrt(x*x + y*y)) * rad2deg

# Calculate aspect
aspect = np.arctan2(x, y)  

# Calculate the shaded relief
shaded = np.sin(altitude * deg2rad) * np.sin(slope * deg2rad) \
 + np.cos(altitude * deg2rad) * np.cos(slope * deg2rad) \
 * np.cos((azimuth - 90.0) * deg2rad - aspect);
shaded = shaded * 255

# Rebuild the new header
header = "ncols        %s\n" % shaded.shape[1]
header += "nrows        %s\n" % shaded.shape[0]
header += "xllcorner    %s\n" % \
   (lx + (cell * (cols - shaded.shape[1])))
header += "yllcorner    %s\n" % \
   (ly + (cell * (rows - shaded.shape[0])))
header += "cellsize     %s\n" % cell
header += "NODATA_value      %s\n" % NODATA

# Set no-data values
for pane in window:
   slope[pane == nd] = NODATA
   aspect[pane == nd] = NODATA
   shaded[pane == nd] = NODATA

# Open the output file, add the header, save the slope grid
with open(slopegrid, "wb") as f:
  np.savetxt(f, slope, fmt="%4i")
# Open the output file, add the header, save the slope grid
with open(aspectgrid, "wb") as f:
  np.savetxt(f, aspect, fmt="%4i")

# Open the output file, add the header, save the array
with open(shadegrid, "wb") as f:
  np.savetxt(f, shaded, fmt="%4i")

If we load the output grid into QGIS and specify the styling to stretch the image to the min and max, we see the following image. You can also open the image in the FWTools OpenEV application discussed in the Installing GDAL section in Chapter 4, Geospatial Python Toolbox, which will automatically stretch the image for optimal viewing.

As you can see, the preceding image is much easier to comprehend than the original pseudo-color representation we examined originally. Next, let's look at the slope raster used to create the shaded relief:
The slope shows the gradual decline in elevation from high points to low points in all directions of the data set. Slope is an especially useful input for many types of hydrology models.
The aspect shows the maximum rate of downslope change from one cell to its neighbors. If you compare the aspect image to the shaded relief image you can see the red and gray values of the aspect image correspond to shadows in the shaded relief. So the slope is primarily responsible for turning the DEM into a terrain relief while the aspect is responsible for the shading.

In the next post, we'll use the elevation data to create elevation contours!

Wednesday, December 4, 2013

Python and Elevation Data: ASCII Grid Files

Editor's Note: This multi-part series is based on Chapter 7 of "Learning Geospatial Analysis with Python" and republished with permission from Packt Publishing. This series will cover using Python to work with elevation data.  


Elevation data is one of the most fascinating types of geospatial data. It represents many different types of
Image courtesy of the
US National Park Service
data sources and formats. Elevation data can display properties of both vector and raster data resulting in unique data products. Elevation data can serve the following purposes:
  • Terrain visualization
  • Land cover classification
  • Hydrology modelling
  • Transportation routing
  • Feature Extraction
You can't perform all of these options with both raster and vector data but because elevation data is three dimensional, containing x, y, and z coordinates, you can often get more out of these data than any other type.

In this chapter, we're going to learn to read and write elevation data in both raster and vector point formats. We'll also create some derivative products. The topics we'll cover are:
  • ASCII Grid elevation data files
  • Shaded-relief images
  • Elevation contours
  • Gridding LIDAR data
  • Creating a 3D mesh
Editor's Note:  This post will cover ASCII Grid elevation data files.  Each bullet above will be a separate post in this series. Instructions for installing GDAL and its Python bindings can be found here:  On most linux systems you can run: apt-get install python-gdal. In my book I provide detailed instructions and links for Windows/Mac/Linux.  On all platforms PIL and Numpy can be installed using Python SetupTools through easy_install or pip.

For most of this chapter we'll use ASCII Grid files or ASCIIGRID . These files are a type of raster data usually associated with elevation data. This grid format stores data as text in equally sized square rows and columns with a simple header. Each cell in a row/column stores a single numeric value, which can represent some feature of terrain, such as elevation, slope, or flow direction. The simplicity makes it an easy-to-use, platform independent raster format. This format is described in the ASCII GRIDS section in Chapter 2, Geospatial Data.

Throughout the book we've relied on GDAL and to some extent PIL to read and write geospatial raster data including the gdalnumeric module to load raster data into NumPy arrays. But ASCIIGRID allows us to read and write rasters using only Python or even NumPy.
Tip: As a reminder, some elevation data sets use image formats to store elevation data. Most image formats only support 8-bit values ranging between 0-255; however, some formats, including TIFF, can store larger values. Geospatial software can typically display these data sets; however, traditional image software and libraries usually do not. For simplicity in this chapter, we'll stick to the ASCIIGRID format for data, which is both human and machine readable, as well as being widely supported.

Reading grids

NumPy has the ability to read the ASCIIGRID format directly using its loadtxt() method designed to read arrays from text files. The first six lines consist of the header, which are not part of the array. The following lines are a sample of a grid header:

ncols        250
nrows        250
xllcorner    277750.0
yllcorner    6122250.0
cellsize     1.0
NODATA_value      -9999

Line 1 contains the number of columns in the grid, which is synonymous with the x axis. Line 2 represents the y axis described as a number of rows. Line 3 represents the x coordinate of the lower left corner, which is the minimum x value. Line 4 is the corresponding minimum y value in the lower left corner of the grid. Line 5 is the cell size or resolution of the raster. Because the cells are square, only one size value is needed, as opposed to the separate x and y resolution values in most geospatial rasters. The fifth line is no data value, which is a number assigned to any cell for which a value is not provided. Geospatial software ignores these cells for calculations and often allows special display settings for it, such as coloring them black. The value -9999 is a common no data placeholder value used in the industry, which is easy to detect in software. In some examples, we'll use the number zero; however, zero can often also be a valid data value.

The numpy.loadtxt() method includes an argument called skiprows , which allows you to specify a number of lines in the file to be skipped before reading array values. To try this technique out you can download a sample grid file called myGrid.asc at the following URL:

So for myGrid.asc we would use the following code:

myArray  = numpy.loadtxt("myGrid.asc", skiprows=6)

This line results in the variable myArray containing a numpy array derived from the ASCIIGRID file myGrid.asc. The ASC file name extension is used by the ASCIIGRID format. This code works great but there's one problem. NumPy allows us to skip the header but not keep it. And we need to keep it to have a spatial reference for our data for processing, as well as for saving this grid or creating a new one.

To solve this problem we'll use Python's built-in linecache module to grab the header. We could open the file, loop through the lines, store each one in a variable, and then close the file. But linecache reduces the solution to a single line. The following line reads the first line in the file into a variable called line 1:

import linecache
line1 = linecache.getline("myGrid.asc", 1)

In the examples in this chapter we'll use this technique to create a simple header processor that can parse these headers into python variables in just a few lines.

Writing grids

Writing grids in Numpy is just as easy as reading them. We use the corresponding numpy.savetxt() function to save a grid to a text file. The only catch is, we must build and add the six lines of header information before we dump the array to the file. This process is slightly different depending on if you are using NumPy versions before 1.7 or after. In either case, you build the header as a string first. If you are using NumPy 1.7 or later, the savetext() method has an optional argument called header, which lets you specify a string as an argument. You can quickly check your NumPy version from the command line using the following command:

python -c "import numpy;print numpy.__version__" 

The backwards compatible method is to open a file, write the header then dump the array. Here is a sample of the Version 1.7 approach to save an array called myArray to an ASCIIGRID file called myGrid.asc:

header = "ncols     %s\n" % myArray.shape[1]
header += "nrows    %s\n" % myArray.shape[0]
header += "xllcorner 277750.0\n"
header += "yllcorner 6122250.0\n"
header += "cellsize 1.0\n"
header += "NODATA_value -9999\n"

numpy.savetxt("myGrid.asc", myArray, \
header=header, fmt="%1.2f")

We make use of python format strings, which allow you to put placeholders in a string to format python objects to be inserted. The %s format variable turns whatever object you reference into a string. In this case we are referencing the number of columns and rows in the array. In NumPy, an array has both a size and shape property. The size property returns an integer for the number of values in the array. The shape property returns a tuple with the number of rows and columns, respectively. So, in the preceding example, we use the shape property tuple to add the row and column counts to the header of our ASCII Grid. Notice we also add a trailing newline character for each line (\n). There is no reason to change the x and y values, cell size, or nodata value unless we altered them in the script. The savetxt() method also has a fmt argument, which allows you to use Python format strings to specify how the array values are written. In this case the %1.2f value specifies floats with at least one number and no more than two decimal places.

The backwards compatible version for NumPy, before 1.6, builds the header string in the same way but creates the file handle first:

import numpy
f = open("myGrid.asc", "w")
numpy.savetxt(f, myArray, fmt="%1.2f")

In the examples in this chapter, we'll introduce Python with an approach for writing files, which provides more graceful file management by ensuring files are closed properly. If any exceptions are thrown, the file is still closed cleanly:

with open("myGrid.asc", "w") as f:
    numpy.savetxt(f, myArray, fmt="%1.2f")

As you'll see in the upcoming examples, this ability to produce valid geospatial data files using only NumPy is quite powerful. In the next couple of examples we'll be using an ASCIIGRID Digital Elevation Model (DEM) of a mountainous area near Vancouver, British Columbia in Canada. You can download this sample as a ZIP file at the following URL:

The following image is the raw DEM colorized using QGIS with a color ramp that makes lower elevation values dark blue and higher elevation values bright red:

While we can conceptually understand the data this way, it is not an intuitive way to visualize the data. Let's see if we can do better.  In the next post we'll create a shaded relief image using this data.