 |
In addition to being the best spatial database engine available, PostGIS can be manipulated using pure Python without any C-library dependencies.
|
PostGIS is the spatial database engine for the open source
PostgreSQL database. For a decade we
ArcSDE or
Oracle Spatial at
NVision but over the last few years we have completely switched to PostGIS unless a client forces us to use a commercial solution.
One of the main things I like about PostgreSQL is fantastic Python support with several options for database drivers. PostGIS is just a collection of PostgresSQL extensions and functions, so any database driver works. In this post, we're going to use the pure Python, dependency-free
pg8000, and
pyshp modules to load a point shapefile into a posts database. Both modules can be quickly installed using
pip or
easy_install.
To get started, follow the PostGIS instructions to get up and running with a spatial database. In this example we'll use a database named "test" that is spatially enabled by PostGIS. We have pg8000 installed as our database interface and pushup to read a shapefile. The shapefile we'll use is a point shapefile containing information about some cities in the state of Mississippi. You can download the shapefile
here.
The following code will set up a connection to our test geodatabase, create a table based on the fields of the shapefile, and then load the shapefile data into the table.
# Import the database driver and shapefile library
import pg8000
import shapefile
# Set up the datbase connection
connection = pg8000.connect(database="test", user="test",
password="test")
# Get the database cursor to execute queries
cursor = connection.cursor()
# Set up a shapefile reader for our shapefile
r = shapefile.Reader("MSCities_Geo_Pts/MSCities_Geo_Pts.shp")
# Build a query to create our "cities" table
# with an id and geometry column.
table_query = """CREATE TABLE cities (
id SERIAL,
PRIMARY KEY (id),
geom GEOMETRY(Point, 4326),
"""
# Get the shapefile fields but skip the first
# one which is a deletion flag used internally
# by dbf software
fields = r.fields[1:]
# We are going to keep track of the fields as a
# string. We'll use this query fragment for our
# insert query later.
field_string = ""
# Loop throug the fields and build our table
# creation query.
for i in range(len(fields)):
# get the field information
f = fields[i]
# get the field name and lowercase it for consistency
f_name = f[0].lower()
# add the name to the query and our string list of fields
table_query += f_name
field_string += f_name
# Get the proper type for each field. Note this check
# is not comprehensive but it convers the types in
# our sample shapefile.
if f[1] == "F":
table_query += " DOUBLE PRECISION"
elif f[1] == "N":
table_query += " INTEGER"
else:
table_query += " VARCHAR"
# If this field isn' the last, we'll add a comma
# and some formatting.
if i != len(fields)- 1:
table_query += ","
table_query += "\n"
table_query += " "
field_string += ","
# Close the query on the field.
else:
table_query += ")"
field_string += ",geom"
# Execute the table query
cursor.execute(table_query)
# Commit the change to the database or it won't stick.
# PostgreSQL is transactional which is good so nothing
# is stored until you're sure it works.
connection.commit()
# Create a python generator for the
# shapefiles shapes and records
shape_records = (shp_rec for shp_rec in r.iterShapeRecords())
# Loop through the shapefile data and add it to our new table.
for sr in shape_records:
# Get our point data.
shape = sr.shape
x, y = shape.points[0]
# Get the attribute data and set it up as
# a query fragment.
attributes = ""
for r in sr.record:
if type(r) == type("string"):
r = r.replace("'", "''")
attributes += "'{}'".format(r)
attributes += ","
# Build our insert query template for this shape record.
# Notice we are going to use a PostGIS function
# which can turn a WKT geometry into a PostGIS
# geometry.
point_query = """INSERT INTO cities
({})
VALUES ({}
ST_GEOMFROMTEXT('POINT({} {})',4326))"""
# Populate our query template with actual data.
format_point_query = point_query.format(field_string,
attributes, x, y)
# Insert the point data
cursor.execute(format_point_query)
# Everything went ok so let's update the database.
connection.commit()
The shapefile data is now available in PostGIS for loading into a GIS as a layer or performing spatial functions in PostGIS. Here's a screenshot of it loaded into QGIS as a PostGIS layer:
I've used this technique to download datasets from the web such as weather data, and update a table on an automated basis as new data is available. One thing we did not do in this example is add a spatial index. PostGIS can manage a lot of data - more than you can efficiently manage on the file system. Spatial indexing is a big part of that efficiency. It also allows multiple users to access that data at the same time. Once you load a data set, you typically do a query like the following to index the data:
index_query = """CREATE INDEX cities_geom_gix
ON cities USING GIST (geom);"""
That's it! PostGIS has dozens of spatial functions that make analyzing data programmatically easy as well. With pg8000 and a PostgreSQL server, PostGIS becomes a giant pure Python GIS library with no compiled dependencies on your client platform.