Intro: This is part five of an eight part series looking at the Elevator Explorer, a fun data interactive mostly coded between the hours of 10 PM to 2 AM during the week leading up to April Fools’ Day, 2013. I’m going to be looking at the things I learned, things I wish I could have done, and the reasoning behind my design choices. The code I’ll be referring to will be in this tagged release on github.
I knew I wanted to geocode all the addresses for the buildings, but I didn’t quite know how my models would look. I knew from past experience that doing a pass of geocoding, then resetting the database, would mean I would have to start geocoding again from square one. How could I make this better?
If only I had a wrapper around geopy that would persist old queries to disk. So I started writing one. At first, I thought I would need to do this in sqlite, but after doing a search for “python+key+value+store”, I found anydbm. What is anydbm? Anydbm is a generic interface to any dbm database. What a name. In my case, it was using Berkley DB. It’s really easy to use: 1) open a file 2) treat it like a dict. Way easier than trying to get a sqlite database going. But my database kept getting corrupted! I finally figured out that I needed to open and close the file for every transaction. Since the anydbm library is pretty dated and I couldn’t use it like a context manager, I had to manually close the file.
My working version of the GoogleV3 geocoder looks like this. I also made a script for dumping my existing geo data back to an anydbm database; that’s viewable here.
So after all that, I ended up with a library that mimicked the GoogleV3 geocoder. To use it, instead of the standard syntax of:
>>> from geopy import geocoders
>>> g = geocoders.GoogleV3()
>>> place, (lat, lng) = g.geocode("10900 Euclid Ave in Cleveland")
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
10900 Euclid Ave, Cleveland, OH 44106, USA: 41.50489, -81.61027
my database cached version of that is:
>>> from geopydb import geocoders
>>> g = geocoders.GoogleV3()
>>> place, (lat, lng) = g.geocode("10900 Euclid Ave in Cleveland")
>>> print "%s: %.5f, %.5f" % (place, lat, lng)
10900 Euclid Ave, Cleveland, OH 44106, USA: 41.50489, -81.61027
Pretty convenient, and made my life easier. You may have noticed I’m not using GeoDjango. That’s because I wanted to deploy to the free tier at Heroku.
Improvements
If I had to write this now, I would switch to using dataset. Dataset came out around the same time as the Elevator Explorer. If it was out a week earlier, I could have used it.