Thursday, March 21, 2013

A little data for geocoding

What's a geocoder to do without data? Fortunately, there's tons of it and more and more produced every day. I have a project where I need to verify the addresses of non-profits. The IRS provides the Statement of Information (SOI) Tax Statistics for Exempt Organizations Business Master File Extract. The data is provided as both Excel files and as fixed width delimited text files. The fixed width files contain all the records and there is one per state.

Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page. 

Below is the script for importing the data.


When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.



This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.