Sunday, August 7, 2016

Pure Python: Loading Shapefiles into PostGIS

In addition to being the best spatial database
available, PostGIS can be manipulated
using pure Python 
without any C-library

PostGIS is the spatial database engine for the open source PostgreSQL database.  For a decade weArcSDE or Oracle Spatial at NVision but over the last few years we have completely switched to PostGIS unless a client forces us to use a commercial solution.

One of the main things I like about PostgreSQL is fantastic Python support with several options for database drivers.  PostGIS is just a collection of PostgresSQL extensions and functions, so any database driver works. In this post, we're going to use the pure Python, dependency-free pg8000, and pyshp modules to load a point shapefile into a posts database.  Both modules can be quickly installed using pip or easy_install.

To get started, follow the PostGIS instructions to get up and running with a spatial database.  In this example we'll use a database named "test" that is spatially enabled by PostGIS.  We have pg8000 installed as our database interface and pushup to read a shapefile.  The shapefile we'll use is a point shapefile containing information about some cities in the state of Mississippi.  You can download the shapefile here.

The following code will set up a connection to our test geodatabase, create a table based on the fields of the shapefile, and then load the shapefile data into the table.
# Import the database driver and shapefile library
import pg8000
import shapefile

# Set up the datbase connection
connection = pg8000.connect(database="test", user="test", 

# Get the database cursor to execute queries
cursor = connection.cursor()

# Set up a shapefile reader for our shapefile
r = shapefile.Reader("MSCities_Geo_Pts/MSCities_Geo_Pts.shp")

# Build a query to create our "cities" table
# with an id and geometry column.
table_query = """CREATE TABLE cities (
                 id SERIAL,
                 PRIMARY KEY (id),
                 geom GEOMETRY(Point, 4326),

# Get the shapefile fields but skip the first
# one which is a deletion flag used internally
# by dbf software
fields = r.fields[1:]

# We are going to keep track of the fields as a
# string. We'll use this query fragment for our
# insert query later.
field_string = ""

# Loop throug the fields and build our table
# creation query.
for i in range(len(fields)):
    # get the field information
    f = fields[i]
    # get the field name and lowercase it for consistency
    f_name = f[0].lower()
    # add the name to the query and our string list of fields
    table_query += f_name
    field_string += f_name
    # Get the proper type for each field. Note this check
    # is not comprehensive but it convers the types in
    # our sample shapefile.
    if f[1] == "F":
        table_query += " DOUBLE PRECISION"
    elif f[1] == "N":
        table_query += " INTEGER"
        table_query += " VARCHAR"
    # If this field isn' the last, we'll add a comma
    # and some formatting.
    if i != len(fields)- 1:
        table_query += ","
        table_query += "\n"
        table_query += "                 "
        field_string += ","
    # Close the query on the field.
        table_query += ")"
        field_string += ",geom"

# Execute the table query

# Commit the change to the database or it won't stick. 
# PostgreSQL is transactional which is good so nothing 
# is stored until you're sure it works.

# Create a python generator for the 
# shapefiles shapes and records
shape_records = (shp_rec for shp_rec in r.iterShapeRecords())

# Loop through the shapefile data and add it to our new table.
for sr in shape_records:
    # Get our point data.
    shape = sr.shape
    x, y = shape.points[0]
    # Get the attribute data and set it up as
    # a query fragment.
    attributes = ""
    for r in sr.record:
        if type(r) == type("string"):
            r = r.replace("'", "''")
        attributes += "'{}'".format(r)
        attributes += ","        
    # Build our insert query template for this shape record.
    # Notice we are going to use a PostGIS function
    # which can turn a WKT geometry into a PostGIS
    # geometry.
    point_query = """INSERT INTO cities 
                     VALUES ({}
                     ST_GEOMFROMTEXT('POINT({} {})',4326))"""    
    # Populate our query template with actual data.
    format_point_query = point_query.format(field_string, 
                                            attributes, x, y)
    # Insert the point data

# Everything went ok so let's update the database.

The shapefile data is now available in PostGIS for loading into a GIS as a layer or performing spatial functions in PostGIS.  Here's a screenshot of it loaded into QGIS as a PostGIS layer:
I've used this technique to download datasets from the web such as weather data, and update a table on an automated basis as new data is available.  One thing we did not do in this example is add a spatial index.  PostGIS can manage a lot of data - more than you can efficiently manage on the file system.  Spatial indexing is a big part of that efficiency.  It also allows multiple users to access that data at the same time.  Once you load a data set, you typically do a query like the following to index the data:
index_query = """CREATE INDEX cities_geom_gix 
           ON cities USING GIST (geom);"""

That's it!  PostGIS has dozens of spatial functions that make analyzing data programmatically easy as well.  With pg8000 and a PostgreSQL server, PostGIS becomes a giant pure Python GIS library with no compiled dependencies on your client platform.


  1. I must say you had done a tremendous job,I appreciate all your efforts.
    Thanks alot for your writings.
    Waiting for a new 1...Please visit our wonderful and valuable website-
    For More Details Visit are Official Websites :

  2. Cash advance loan should be used very prudently and only for meeting financial exigencies that cannot be avoided. Please remember that getting a payday loan or cash advance loan too frequently could upset your monthly budget and you may find it difficult to get out. payday loans san diego

  3. Thank you for updating! I'm glad to find this post!

  4. check this out, i found the interesting information, so dont miss it

  5. Thanks shared.
    I need to think that one through, thanks for posting.

  6. It is so nice to see you writing on your blog again!

  7. It is so nice to see you writing on your blog again!

  8. This is such an important point. Thank you!