Pages

Archive

2 comments:

  1. Hello :)
    Thank you so much for your wonderful shapefile creator I was trying to use Arc prior to that which works fine until I had a memory issue... that being said I eventually ran into it with yours as well :(. Are you able to specify how many records will append to the self._shapes.append(polyShape)
    Is it just the array size of Python that is holding it back? I found the maximum size of a python list on a 32 bit system is 536,870,912 elements and I have 2.3 mil recs so I think I surpassed that... I wonder when would be a good cutoff for a csv to split for processing do you think?

    ReplyDelete
  2. Thanks for the kind words! I've actually been working on a version to do just what you described. It's conceptually straight forward but a pain to implement. The reason you keep hitting a wall with different software is the shp file header. The shapefiles contain variable length records. The total record counts and byte length is listed in the header. So what library developers do is just total everything up after you add all your shapes. Most of the time that's fine until you bump up against data type limits. The solution for REALLY large shapefiles is to keep a running total if record counts and lengths, flush data to disk from a manageable buffer, and just update the file header with the running totals. This method should work efficiently as long as you have disk space. I wrote an implementation but it doesn't work and I haven't finished debugging it but when it's finished it will handle your case fine. Have you tried the OGR library yet either as a python library or just the command-line conversion tools? I suspect you'll hit some limit there too though. Most software doesn't do well reading and or rendering 2 million point shapefiles either. I was working with some LIDAR data today in QGIS that had over 600k points and it wasn't fun. Have you considered a spatial database like PostGIS or ArcSDE?

    ReplyDelete