The ability to classify an image leads us to another remote-sensing capability. Vector GIS data such as shapefiles are typically extracted from remotely-sensed images. Extraction normally involves an analyst clicking around each object in an image and drawing the feature to save it as data. But it is also possible with good remotely-sensed data and proper pre-processing to automatically extract features from an image.
For this example we'll take a subset of our Landsat 8 thermal image to isolate a group of barrier islands as seen in the following screenshot:
You can download this image here:
http://geospatialpython.googlecode.com/files/islands.zip
Our goal with this example is to automatically extract the three islands in the image as a shapefile. But before we can do that, we need to mask out any data we aren't interested in. For example, the water has a wide range of pixel values, as do the islands themselves. If we just want to extract the islands themselves, we need to push all pixel values into just two bins to make the image black and white. This technique is called thresholding . The islands in the image have enough contrast with the water in the background such that thresholding should isolate them nicely.
In the following script we will read the image into an array and then histogram the image using only two bins. We will then use the colors black and white to color the two bins. This script is simply a modified version of our classification script with a very limited output:
import gdalnumeric
# Input file name (thermal image)
src = "islands.tif"
# Output file name
tgt = "islands_classified.tiff"
# Load the image into numpy using gdal
srcArr = gdalnumeric.LoadFile(src)
# Split the histogram into 20 bins as our classes
classes = gdalnumeric.numpy.histogram(srcArr, bins=2)[1]
lut = [[255,0,0],[0,0,0],[255,255,255]]
# Starting value for classification
start = 1
# Set up the output image
rgb = gdalnumeric.numpy.zeros((3, srcArr.shape[0], \
   srcArr.shape[1],),gdalnumeric.numpy.float32)
       
# Process all classes and assign colors
for i in range(len(classes)):
    mask = gdalnumeric.numpy.logical_and(start <= srcArr, \ 
    srcArr <= classes[i])
    for j in range(len(lut[i])):
        rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j], \
        lut[i][j]))
    start = classes[i]+1 
# Save the image
gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), \
 tgt, format="GTIFF", prototype=src)
The output looks great as seen in the following screenshot:
The islands are clearly isolated so our extraction script will be able to identify them as polygons and save them to a shapefile. The GDAL library has a method called Polygonize() that does exactly that. It groups all sets of isolated pixels in an image and saves them out as a feature data set. One interesting technique we will use in this script is to use our input image as a mask. The Polygonize() method allows you to specify a mask that will use the color black as a filter that will prevent the water from being extracted as a polygon, so we'll end up with just the islands. Another area to note in the script is that we copy the georeferencing information from our source image to our shapefile to geolocate it properly:
import gdal
import ogr, osr
# Thresholded input raster name
src = "islands_classified.tiff"
# Output shapefile name
tgt = "extract.shp"
# OGR layer name
tgtLayer = "extract"
# Open the input raster
srcDS = gdal.Open(src)
# Grab the first band
band = srcDS.GetRasterBand(1)
# Force gdal to use the band as a mask
mask = band
# Set up the output shapefile
driver = ogr.GetDriverByName("ESRI Shapefile")
shp = driver.CreateDataSource(tgt)
# Copy the spatial reference
srs = osr.SpatialReference()
srs.ImportFromWkt(srcDS.GetProjectionRef())
layer = shp.CreateLayer(tgtLayer, srs=srs)
# Set up the dbf file
fd = ogr.FieldDefn( "DN", ogr.OFTInteger )
layer.CreateField(fd)
dst_field = 0
# Automatically extract features from an image!
extract = gdal.Polygonize(band, mask, layer, dst_field, [], None)
import shapefile
import pngcanvas
# Open the extracted islands
r = shapefile.Reader("extract.shp")
# Setup the world to pixels conversion
xdist = r.bbox[2] - r.bbox[0]
ydist = r.bbox[3] - r.bbox[1]
iwidth = 800
iheight = 600
xratio = iwidth/xdist
yratio = iheight/ydist
polygons = []
# Loop through all shapes
for shape in r.shapes():
  # Loop through all parts to catch
  # polygon holes!
  for i in range(len(shape.parts)):
    pixels=[]
    pt = None
    if I < len(shape.parts)-1:
      pt = shape.points[shape.parts[i]:shape.parts[i+1]]
    else:
      pt = shape.points[shape.parts[i]:]
    for x,y in pt:
      px = int(iwidth - ((r.bbox[2] - x) * xratio))
      py = int((r.bbox[3] - y) * yratio)
      pixels.append([px,py])
    polygons.append(pixels)
# Set up the output canvas
c = pngcanvas.PNGCanvas(iwidth,iheight)
# Loop through the polygons and draw them
for p in polygons:
  c.polyline(p)
# Save the image
f = file("extract.png", "wb")
f.write(c.dump())
f.close()
The following screenshot shows our automatically extracted island features! Commercial packages that do this kind of work can easily cost tens of thousands of dollars. While these packages are very robust, it is still fun to see how far you can get with simple Python scripts and a few open source packages. In many cases you can do everything you need to do.
 The western-most island contains the polygon hole as shown in the following screenshot, which is zoomed to that area:
Automated feature extraction is a holy grail within geospatial analysis because of the cost and tedious effort required to manually extract features. The key to feature extraction is proper image classification. Automated feature extraction works well with water bodies (and islands), roads, farm fields, buildings, and other features that tend to have high-contrast pixel values with their background.




 
 




