Pages

Sunday, April 3, 2016

Pure Python NetCDF Reader

Great news! Pure Python geospatial programmer Karim Bahgat created a depenency-free Pythonpyncf.  If you're not familiar with NetCDF, it is technically a collection of software libraries and machine-independent data formats commonly used for multi-dimensional scientific data of any type.  NetCDF currently uses a hierarchial data storage format called HDF5.  NetCDF has a concept of dimensions, variables, and attributes and can scale from one-to-one relationships up to many-to-many relationships in all directions.  In geospatial contexts, NetCDF is frequently used to store 2D and 3D raster or vector datasets over time.  It is very popular for ocean observations and climatological data.
NetCDF reader called

He's posted the code and some more usage information on Github too:  https://github.com/karimbahgat/pyncf

Karim goes into detail on the ideas behind the library on his blog.  But be careful, the examples on the blog aren't as current as the ones on Github: https://thepythongischallenge.wordpress.com/2016/03/26/pynetcdf-netcdf-files-in-pure-python/

As a quick test I installed pyncf using pip directly from Github:
pip install https://github.com/karimbahgat/pyncf/archive/master.zip

I then downloaded the sample time series orthogonal point data NetCDF sample file from here:
https://www.nodc.noaa.gov/data/formats/netcdf/v1.1/

The actual link to the netcdf file is here:
/thredds/fileServer/testdata/netCDFTemplateExamples/timeSeries/BodegaMarineLabBuoy.nc

What's nice about these NOAA samples is that they have a plain-text CDL (Common Data Language) version as well that let's you see what's in the file when you're experimenting with an API like this:
http://data.nodc.noaa.gov/testdata/netCDFTemplateExamples/timeSeries/BodegaMarineLabBuoy.cdl

First, I imported the library and created a NetCDF object from the sample file:

import pyncf
nc = pyncf.NetCDF(filepath="BodegaMarineLabBuoy.nc")
Then I created a variable to hold the header dictionary:
header = nc.header
The header metadata contains a variety of summary data as nested dictionaries and list.  We can access the product description by traversing that structure:

header["gatt_list"][1]["values"]

'These seawater data are collected by a moored fluorescence 
and turbidity instrument operated at Cordell Bank, California, USA, 
by CBNMS and BML. Beginning on 2008-04-23, fluorescence and turbidity 
measurements were collected using a Wetlabs ECO Fluorescence 
and Turbidity Sensor (ECO-FLNTUSB). The instrument depth of the 
water quality sensors was 01.0 meter, in an overall water depth 
of 85 meters (both relative to Mean Sea Level, MSL). 
The measurements reflect a 10 minute sampling interval.'

The time values in this dataset are stored as epoch seconds.  I accessed those and then converted the first and last into readable dates to see exactly what period this dataset spans:

import time
t = nc.read_dimension_values("time")
print time.ctime(t[0])
'Mon Jul 28 12:30:00 2008'
print time.time(t[-1])
'Wed Sep 10 10:31:00 2008'
The pyncf codebase is considered an alpha version and is currently read only, but what a great addition to your pure Python geospatial toolbox!  I wish this library was available when I updated "Learning Geospatial Analysis with Python"!