Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page.
Below is the script for importing the data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- read fixed field data into postgres as text | |
CREATE TABLE eo_staging (data text); | |
COPY eo_staging FROM '/path/to/EO_CA.LST'; | |
-- Create table for the data | |
CREATE TABLE eo_ca ( | |
ein varchar(10), | |
name varchar(70), | |
in_care_of_name varchar(35), | |
address varchar(35), | |
city varchar(21), | |
state varchar(2), | |
zipcode varchar(10), | |
group_exemption varchar(4), | |
subsection_code varchar(2), | |
affiliation_code varchar(1), | |
classification_code varchar(4), | |
ruling_date varchar(6), | |
deductibility_code varchar(1), | |
foundation_code varchar(2), | |
activity_codes varchar(9), | |
organization_code varchar(1), | |
exempt_status_code varchar(2), | |
ruling_process varchar(6), | |
asset_code varchar(1), | |
income_code varchar(1), | |
filing_requirement_code varchar(2), | |
pf_filing_requirement_code varchar(1), | |
blanks varchar(3), | |
accounting_period varchar(2), | |
asset_amount varchar(13), | |
income_amount varchar(13), | |
negative_income varchar(1), | |
form_990_revenue varchar(13), | |
negative_revenue varchar(1), | |
ntee_code varchar(4), | |
sort_name varchar(35) | |
); | |
-- Split the data into their respective columns and | |
-- insert into table | |
INSERT INTO eo_ca ( | |
ein, | |
name, | |
in_care_of_name, | |
address, | |
city, | |
state, | |
zipcode, | |
group_exemption, | |
subsection_code, | |
affiliation_code, | |
classification_code, | |
ruling_date, | |
deductibility_code, | |
foundation_code, | |
activity_codes, | |
organization_code, | |
exempt_status_code, | |
ruling_process, | |
asset_code, | |
income_code, | |
filing_requirement_code, | |
pf_filing_requirement_code, | |
blanks, | |
accounting_period, | |
asset_amount, | |
income_amount, | |
negative_income, | |
form_990_revenue, | |
negative_revenue, | |
ntee_code, | |
sort_name) | |
SELECT | |
trim(substring(data,1,9)) AS ein, | |
trim(substring(data,10,70)) AS name, | |
trim(substring(data,80,35)) AS in_care_of_name, | |
trim(substring(data,115,35)) AS address, | |
trim(substring(data,150,21)) AS city, | |
trim(substring(data,172,2)) AS state, | |
trim(substring(data,174,10)) AS zipcode, | |
trim(substring(data,184,4)) AS group_exemption, | |
trim(substring(data,188,2)) AS subsection_code, | |
trim(substring(data,190,1)) AS affiliation_code, | |
trim(substring(data,191,4)) AS classification_code, | |
trim(substring(data,195,6)) AS ruling_date, | |
trim(substring(data,201,1)) AS deductibility_code, | |
trim(substring(data,202,2)) AS foundation_code, | |
trim(substring(data,204,9)) AS activity_codes, | |
trim(substring(data,213,1)) AS organization_code, | |
trim(substring(data,214,2)) AS exempt_status_code, | |
trim(substring(data,216,6)) AS ruling_process, | |
trim(substring(data,228,1)) AS asset_code, | |
trim(substring(data,229,1)) AS income_code, | |
trim(substring(data,230,2)) AS filing_requirement_code, | |
trim(substring(data,232,1)) AS pf_filing_requirement_code, | |
trim(substring(data,233,3)) AS blanks, | |
trim(substring(data,236,2)) AS accounting_period, | |
trim(substring(data,238,13)) AS asset_amount, | |
trim(substring(data,251,13)) AS income_amount, | |
trim(substring(data,264,1)) AS negative_income, | |
trim(substring(data,265,13)) AS form_990_revenue, | |
trim(substring(data,278,1)) AS negative_revenue, | |
trim(substring(data,279,4)) AS ntee_code, | |
trim(substring(data,283,35)) AS sort_name | |
FROM | |
eo_staging; | |
When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- note: in PGAdmin you must use the tab key instead of \t | |
COPY eo_ca TO '/path/to/eo_ca.csv' CSV DELIMITER \t'; |
This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.