GeospatialPython.com: shapetype

Showing posts with label shapetype. Show all posts

Tuesday, May 29, 2012

SBN Mystery - Solved!

Last October I asked you all for help in figuring out the spatial indexing algorithm used to create Esri sbn files. Using Pyshp, I had successfully decoded the file formats which I provided. However I could not figure out the algorithm used to create and populate the spatial bins within these files.

Today I'm pleased to announce this challenge has been answered. The GIS community now has access to both the sbn and sbx file format as well as the algorithm for grouping features in a shapefile into "spatial bins".

I'm glad I asked for help as this challenge turned out to be quite difficult. The brain behind this operation is Marc Pfister with some good insights from Si Parker. Marc worked tirelessly on this problem for months with a cross-country move and complete career change thrown in to make it interesting. Marc did all the heavy intellectual lifting with me playing an inquisitive, but usually short-sighted Watson to his Holmes. I generated endless series of shapefiles and one-off scripts to help Marc try and flush out the algorithm based only on subtle changes in the number of bins and features they contained as well as his past experience with spatial indexing.

When I figured out the file formats I had hoped I was just a Wikipedia search away from recognizing the spatial tree algorithm. But the solution turned out to be much more complex than that. Esri uses a sort of balanced tree that exhibits traits of several different algorithms. The system seems carefully designed but is by no means obvious. I will publish Marc's findings as soon as I can.

There are still a few shapefile cases which create puzzling but insignificant results. However we are at the 98% mark. The project goal of compatibility has been reached. There is no longer any reason to hold off on sharing the results. We are fairly certain that we are able to create sbn and sbx files which sufficiently fool ArcMap as well as other Esri packages so other software can read, use, and generate these indexes alongside the Esri suite. There is more testing to do but it seems we are out of the woods.

What we haven't done is nicely packaged all of this work up. But Marc posted a small set of Python scripts on github which demonstrate the algorithm and file handling needed to copy this capability. Over the coming months I will fold this code into Pyshp, produce better documentation on the algorithm, and provide posts on how to deal with these indexes. But for now here's what you've been waiting for:

https://github.com/drwelby/hasbeen

By the way, Marc does freelance programming. In my job, I get the opportunity to work with lots of really bright geospatial programmers and mathematicians and this guy is one of the very best I've ever seen. If you have a tough geospatial project and need some E=MC2 smarts definitely look him up.

Tuesday, August 23, 2011

Point in Polygon 2: Walking the line

Credit: SpikedMath.com

This post is a follow-up to my original article on testing if a point is inside a polygon. Reader Sebastian V. pointed out the ray-casting alogrithm I used does not test to see if the point is on the edge of the polygon or one of the verticies. He was even nice enough to send a PHP script he found which uses an indentical ray-casting method and includes a vertex and edge test as well.

These two checks are relatively simple however whether they are necessary is up to you and how you apply this test. There are some cases where a boundary point would not be considered for inclusion. Either way now you have an option. This function could even be modified to optionally check for boundary points.

# Improved point in polygon test which includes edge
# and vertex points

def point_in_poly(x,y,poly):

   # check if point is a vertex
   if (x,y) in poly: return "IN"

   # check if point is on a boundary
   for i in range(len(poly)):
      p1 = None
      p2 = None
      if i==0:
         p1 = poly[0]
         p2 = poly[1]
      else:
         p1 = poly[i-1]
         p2 = poly[i]
      if p1[1] == p2[1] and p1[1] == y and x > min(p1[0], p2[0]) and x < max(p1[0], p2[0]):
         return "IN"
      
   n = len(poly)
   inside = False

   p1x,p1y = poly[0]
   for i in range(n+1):
      p2x,p2y = poly[i % n]
      if y > min(p1y,p2y):
         if y <= max(p1y,p2y):
            if x <= max(p1x,p2x):
               if p1y != p2y:
                  xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
               if p1x == p2x or x <= xints:
                  inside = not inside
      p1x,p1y = p2x,p2y

   if inside: return "IN"
   else: return "OUT"

# Test a vertex for inclusion
poligono = [(-33.416032,-70.593016), (-33.415370,-70.589604),
(-33.417340,-70.589046), (-33.417949,-70.592351),
(-33.416032,-70.593016)]
lat= -33.416032
lon= -70.593016

print point_in_poly(lat, lon, poligono)

# test a boundary point for inclusion
poly2 = [(1,1), (5,1), (5,5), (1,5), (1,1)]
x = 3
y = 1
print point_in_poly(x, y, poly2)

You can download this script here.

Monday, March 21, 2011

Working with Elevation Values in Shapefiles

I've had several questions about working with elevation values in shapefiles. Awhile back I posted a brief example in the wiki on the google code project site and felt it should be included here so this information is more visible.

Elevation values, known as "Z" values allow you to create 3-dimensional shapefiles. Working with elevation values in the Python Shapefile Library is easy but requires a step which hasn't been well documented until now. Make sure you are working with at least version 1.0 (revision 24 in the repository) as previous versions have a bug that prevents elevation values from being written properly as well as another bug which occurs when creating certain polygons.

The shapefile specification allows for three types of 3-dimensional shapefiles:

PointZ - a 3D point shapefile, PolylineZ - a 3D line shapefile, and PolygonZ - a 3D polygon shapefile.

The most important thing to remember is to explicitly set the shape type of each record to the correct shape type or it will default to a 2-dimensional shape. Eventually the library will automatically detect the elevation type.

Here is a brief example of creating a PolygonZ shapefile:

import shapefile
w = shapefile.Writer(shapeType=shapefile.POLYGONZ)
# OR you can type
# w = shapefile.Writer(shapeType=15)
w.poly([[[-89.0, 33, 12], [-90, 31, 11], [-91, 30, 12]]], shapeType=15)
w.field("NAME")
w.record("PolyZTest")
w.save("MyPolyZ")

When you read 3D shapefiles the elevation values will be stored separately:

>>>import shapefile
>>>r = shapefile.Reader("MyPolyZ")
>>>r.shapes()[0].points
[[-89.0, 33.0], [-90.0, 31.0], [-91.0, 30.0]]
>>>r.shapes()[0].z
[12, 11, 12]

So remember: explicitly set the shapetype for each shape you add to ensure the z values are maintained. And know that the z values are stored in the "z" property of each shape instead of being integrated with the x,y value lists.

Monday, February 28, 2011

Changing a Shapefile's Type

A polygon, line, and point version of the same shapefile.

Sometimes you want to convert a shapefile from one type to another. For example you may want to convert a line shapefile to a polygon or a polygon to a point or multipoint shapefile. There are many reasons for this type of operations ranging from error checking, to special queries, to inconvenient distribution formats. For example a lot of coastline data is distributed as line data but you may want to convert it to a polygon to estimate coastal erosion using area comparisons between two different dates.

Performing this type of conversion is very straightforward using the Python Shapefile Library. In fact the conversion is basically a one-off version of the shapefile merge example I wrote about recently. You read in one shapefile and write the features and records out to another of the correct type. There are a couple of pitfalls you need to be wary of though. One is the current version (1.0) of the PSL requires you to explicitly set the shape type of each record if you want to convert them. The second issue is if you are converting to a single point shapefile where each point feature is a record you must compensate for the imbalance in the dbf records by copying the record from the parent feature for each point. Instead of dealing with this issue you could simply create a multi-point shapefile where each shape record is allowed to be a collection of points. Which method you choose depends on what you are trying to do with the output. The examples below cover both methods.

The example in this post takes a state boundary polygon file and converts it to a line shapefile, then a multipoint shapefile, then a regular point shapefile. Note the difference between the point shapefile and the line and multipoint examples.

"""
Convert one shapefile type to another 
"""

import shapefile


# Create a line and a multi-point 
# and single point version of
# a polygon shapefile

# The shapefile type we are converting to
newType = shapefile.POLYLINE

# This is the shapefile we are trying
# to convert. In this case it's a
# state boundary polygon file for 
# Mississippi with one polygon and
# one dbf record.
r = shapefile.Reader("Mississippi")

## POLYLINE version
w = shapefile.Writer(newType)
w._shapes.extend(r.shapes())
# You must explicity set the shapeType of each record.
# Eventually the library will set them to the same
# as the file shape type automatically.
for s in w.shapes():
  s.shapeType = newType
w.fields = list(r.fields)
w.records.extend(r.records())
w.save("Miss_Line")

## MULTIPOINT version
newType = shapefile.MULTIPOINT

w = shapefile.Writer(newType)
w._shapes.extend(r.shapes())
for s in w.shapes():
  s.shapeType = newType
w.fields = list(r.fields)
w.records.extend(r.records())
w.save("Miss_MPoint")

## POINT version
newType = shapefile.POINT

w = shapefile.Writer(newType)
# For a single point shapefile
# from another type we
# "flatten" each shape
# so each point is a new record.
# This means we must also assign
# each point a record which means
# records are usually duplicated.
for s in r.shapeRecords():
  for p in s.shape.points:
    w.point(*p)
    w.records.append(s.record)  
w.fields = list(r.fields)
w.save("Miss_Point")

You can download the state boundary polygon shapefile used in the example from the GeospatialPython Google Code Project Downloads section. You can download the sample script above from the subversion repository of that same project.

And of course the Python Shapefile Library is here.

Pages