In addition to being the best spatial database
engine available, PostGIS can be manipulated using pure Python without any C-library dependencies. |
One of the main things I like about PostgreSQL is fantastic Python support with several options for database drivers. PostGIS is just a collection of PostgresSQL extensions and functions, so any database driver works. In this post, we're going to use the pure Python, dependency-free pg8000, and pyshp modules to load a point shapefile into a posts database. Both modules can be quickly installed using pip or easy_install.
To get started, follow the PostGIS instructions to get up and running with a spatial database. In this example we'll use a database named "test" that is spatially enabled by PostGIS. We have pg8000 installed as our database interface and pushup to read a shapefile. The shapefile we'll use is a point shapefile containing information about some cities in the state of Mississippi. You can download the shapefile here.
The following code will set up a connection to our test geodatabase, create a table based on the fields of the shapefile, and then load the shapefile data into the table.
# Import the database driver and shapefile library import pg8000 import shapefile # Set up the datbase connection connection = pg8000.connect(database="test", user="test", password="test") # Get the database cursor to execute queries cursor = connection.cursor() # Set up a shapefile reader for our shapefile r = shapefile.Reader("MSCities_Geo_Pts/MSCities_Geo_Pts.shp") # Build a query to create our "cities" table # with an id and geometry column. table_query = """CREATE TABLE cities ( id SERIAL, PRIMARY KEY (id), geom GEOMETRY(Point, 4326), """ # Get the shapefile fields but skip the first # one which is a deletion flag used internally # by dbf software fields = r.fields[1:] # We are going to keep track of the fields as a # string. We'll use this query fragment for our # insert query later. field_string = "" # Loop throug the fields and build our table # creation query. for i in range(len(fields)): # get the field information f = fields[i] # get the field name and lowercase it for consistency f_name = f[0].lower() # add the name to the query and our string list of fields table_query += f_name field_string += f_name # Get the proper type for each field. Note this check # is not comprehensive but it convers the types in # our sample shapefile. if f[1] == "F": table_query += " DOUBLE PRECISION" elif f[1] == "N": table_query += " INTEGER" else: table_query += " VARCHAR" # If this field isn' the last, we'll add a comma # and some formatting. if i != len(fields)- 1: table_query += "," table_query += "\n" table_query += " " field_string += "," # Close the query on the field. else: table_query += ")" field_string += ",geom" # Execute the table query cursor.execute(table_query) # Commit the change to the database or it won't stick. # PostgreSQL is transactional which is good so nothing # is stored until you're sure it works. connection.commit() # Create a python generator for the # shapefiles shapes and records shape_records = (shp_rec for shp_rec in r.iterShapeRecords()) # Loop through the shapefile data and add it to our new table. for sr in shape_records: # Get our point data. shape = sr.shape x, y = shape.points[0] # Get the attribute data and set it up as # a query fragment. attributes = "" for r in sr.record: if type(r) == type("string"): r = r.replace("'", "''") attributes += "'{}'".format(r) attributes += "," # Build our insert query template for this shape record. # Notice we are going to use a PostGIS function # which can turn a WKT geometry into a PostGIS # geometry. point_query = """INSERT INTO cities ({}) VALUES ({} ST_GEOMFROMTEXT('POINT({} {})',4326))""" # Populate our query template with actual data. format_point_query = point_query.format(field_string, attributes, x, y) # Insert the point data cursor.execute(format_point_query) # Everything went ok so let's update the database. connection.commit()
The shapefile data is now available in PostGIS for loading into a GIS as a layer or performing spatial functions in PostGIS. Here's a screenshot of it loaded into QGIS as a PostGIS layer:
I've used this technique to download datasets from the web such as weather data, and update a table on an automated basis as new data is available. One thing we did not do in this example is add a spatial index. PostGIS can manage a lot of data - more than you can efficiently manage on the file system. Spatial indexing is a big part of that efficiency. It also allows multiple users to access that data at the same time. Once you load a data set, you typically do a query like the following to index the data:
index_query = """CREATE INDEX cities_geom_gix ON cities USING GIST (geom);"""
That's it! PostGIS has dozens of spatial functions that make analyzing data programmatically easy as well. With pg8000 and a PostgreSQL server, PostGIS becomes a giant pure Python GIS library with no compiled dependencies on your client platform.