Pages

Monday, August 17, 2015

CSV to Shapefile

Converting a comma-separated value (CSV) file to a shapefile comes up a lot on forums.  This post is a quick example of transforming a CSV file to a shapefile using PyShp and the built-in Python csv module.  CSV files usually contain fields separated by commas but might also use some other delimiter such as a pipe "|" or tab.  The csv module defaults to commas but will let you set other delimiters. Here is the file we'll be using that I slightly modified from a GIS StackExchange question flagged as duplicate:

Name,Area,geometry
town,123,"POLYGON((73.953695297241 15.3555421329,73.951292037964 15.314816978733,73.986654281616 15.310015523128,73.982191085815 15.345775441645,73.953695297241 15.3555421329))"
forest,500,"POLYGON((73.938202857971 15.34362339739,73.944897651672 15.343375083164,73.943438529968 15.341554103151,73.942408561707 15.337084357624,73.941378593445 15.340395289421,73.937001228333 15.341471330955,73.938202857971 15.34362339739))"
lake,800,"POLYGON((73.995494842529 15.291305352455,73.999614715576 15.287165708411,74.000301361084 15.283357163686,73.997898101807 15.281701253096,73.99377822876 15.282363618901,73.99377822876 15.284516293317,73.992576599121 15.287496882943,73.992919921875 15.289815090018,73.995494842529 15.291305352455))"

This notional CSV has three fields for the name of the feature, the area, and the geometry. The geometry consists of a WKT string describing the polygon coordinates. The polygons contain varying numbers of coordinates and the x,y values are separated by commas. None of this will confuse the csv module because they are contained in strings. The code is very simple. We use the csv module to access the data as rows and fields. Then we manually parse the coordinates as strings. And finally we write the shapefile in a loop:

import csv
import shapefile

# Create a polygon shapefile writer
w = shapefile.Writer(shapefile.POLYGON)

# Add our fields
w.field("NAME", "C", "40")
w.field("AREA", "C", "40")

# Open the csv file and set up a reader
with open("sample.csv") as p:
    reader = csv.DictReader(p)
    for row in reader:
        # Add records for each polygon for name and area
        w.record(row["Name"], row["Area"])
        # parse the coordinate string
        wkt = row["geometry"][9:-2]
        # break the coordinate string in to x,y values
        coords = wkt.split(",")
        # set up a list to contain the coordinates
        part = []
        # convert the x,y values to floats
        for c in coords:
            x,y = c.split(" ")
            part.append([float(x),float(y)])
        # create a polygon record with the list of coordinates.
        w.poly(parts=[part])

# save the shapefile!
w.save("polys.shp")
You can download the code and CSV file on GitHub.

4 comments:

  1. From a programming and GIS point of view: This is great, easy to use and very easy to understand. Thank you!
    From a chemistry point of view: What the heck is going on with all sulfur in this carbon ring??? :)

    ReplyDelete
    Replies
    1. Pure ignorance! That's what's going on. 7 billion people in the world, 4.79 billion web pages, and I manage to get busted by the one geospatial-chemist out there :-)

      Delete
  2. Is there a way to prevent Error: field larger than field limit (131072)? I can draw polygons composed by 6k entries (long/lat)... but not much more than that... Polygons with 7k entries fails...

    ReplyDelete