Imagery
Skybox Imaging and Planet Labs have launched imaging satellites, expect bunch of cool new image products and imagery derived data in 2014. Also note that Frank Warmerdam is at Planet Labs.
But wait, there's more! There's another readily available source of imagery data, it's in the photos people are posting to Instagram, Flickr and Facebook. Expect tools to exploit this source of imagery.
Hardware hacking
Arduino and Raspberrypi are moving out of their respective blinky lights infancy. Geohipsters will be connecting them to sensors and talking to via node.js. Expect to see other hardware platforms such as Tessel making inroads on the hardware hacking movement.
Car hacking is still in it's infancy with blue tooth ODBII modules. But as more cars roll out as mobile platforms replete with a API, car modding will be more than just chip modding for performance.
Tuesday, December 31, 2013
Thursday, March 21, 2013
A little data for geocoding
What's a geocoder to do without data? Fortunately, there's tons of it and more and more produced every day. I have a project where I need to verify the addresses of non-profits. The IRS provides the Statement of Information (SOI) Tax Statistics for Exempt Organizations Business Master File Extract. The data is provided as both Excel files and as fixed width delimited text files. The fixed width files contain all the records and there is one per state.
Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page.
Below is the script for importing the data.
When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.
This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.
Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page.
Below is the script for importing the data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- read fixed field data into postgres as text | |
CREATE TABLE eo_staging (data text); | |
COPY eo_staging FROM '/path/to/EO_CA.LST'; | |
-- Create table for the data | |
CREATE TABLE eo_ca ( | |
ein varchar(10), | |
name varchar(70), | |
in_care_of_name varchar(35), | |
address varchar(35), | |
city varchar(21), | |
state varchar(2), | |
zipcode varchar(10), | |
group_exemption varchar(4), | |
subsection_code varchar(2), | |
affiliation_code varchar(1), | |
classification_code varchar(4), | |
ruling_date varchar(6), | |
deductibility_code varchar(1), | |
foundation_code varchar(2), | |
activity_codes varchar(9), | |
organization_code varchar(1), | |
exempt_status_code varchar(2), | |
ruling_process varchar(6), | |
asset_code varchar(1), | |
income_code varchar(1), | |
filing_requirement_code varchar(2), | |
pf_filing_requirement_code varchar(1), | |
blanks varchar(3), | |
accounting_period varchar(2), | |
asset_amount varchar(13), | |
income_amount varchar(13), | |
negative_income varchar(1), | |
form_990_revenue varchar(13), | |
negative_revenue varchar(1), | |
ntee_code varchar(4), | |
sort_name varchar(35) | |
); | |
-- Split the data into their respective columns and | |
-- insert into table | |
INSERT INTO eo_ca ( | |
ein, | |
name, | |
in_care_of_name, | |
address, | |
city, | |
state, | |
zipcode, | |
group_exemption, | |
subsection_code, | |
affiliation_code, | |
classification_code, | |
ruling_date, | |
deductibility_code, | |
foundation_code, | |
activity_codes, | |
organization_code, | |
exempt_status_code, | |
ruling_process, | |
asset_code, | |
income_code, | |
filing_requirement_code, | |
pf_filing_requirement_code, | |
blanks, | |
accounting_period, | |
asset_amount, | |
income_amount, | |
negative_income, | |
form_990_revenue, | |
negative_revenue, | |
ntee_code, | |
sort_name) | |
SELECT | |
trim(substring(data,1,9)) AS ein, | |
trim(substring(data,10,70)) AS name, | |
trim(substring(data,80,35)) AS in_care_of_name, | |
trim(substring(data,115,35)) AS address, | |
trim(substring(data,150,21)) AS city, | |
trim(substring(data,172,2)) AS state, | |
trim(substring(data,174,10)) AS zipcode, | |
trim(substring(data,184,4)) AS group_exemption, | |
trim(substring(data,188,2)) AS subsection_code, | |
trim(substring(data,190,1)) AS affiliation_code, | |
trim(substring(data,191,4)) AS classification_code, | |
trim(substring(data,195,6)) AS ruling_date, | |
trim(substring(data,201,1)) AS deductibility_code, | |
trim(substring(data,202,2)) AS foundation_code, | |
trim(substring(data,204,9)) AS activity_codes, | |
trim(substring(data,213,1)) AS organization_code, | |
trim(substring(data,214,2)) AS exempt_status_code, | |
trim(substring(data,216,6)) AS ruling_process, | |
trim(substring(data,228,1)) AS asset_code, | |
trim(substring(data,229,1)) AS income_code, | |
trim(substring(data,230,2)) AS filing_requirement_code, | |
trim(substring(data,232,1)) AS pf_filing_requirement_code, | |
trim(substring(data,233,3)) AS blanks, | |
trim(substring(data,236,2)) AS accounting_period, | |
trim(substring(data,238,13)) AS asset_amount, | |
trim(substring(data,251,13)) AS income_amount, | |
trim(substring(data,264,1)) AS negative_income, | |
trim(substring(data,265,13)) AS form_990_revenue, | |
trim(substring(data,278,1)) AS negative_revenue, | |
trim(substring(data,279,4)) AS ntee_code, | |
trim(substring(data,283,35)) AS sort_name | |
FROM | |
eo_staging; | |
When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- note: in PGAdmin you must use the tab key instead of \t | |
COPY eo_ca TO '/path/to/eo_ca.csv' CSV DELIMITER \t'; |
This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.
Saturday, February 2, 2013
Data Science Tool Kit on Vagrant
Pete Warden has released a version of the Data Science Tool Kit on Vagrant. DSTK is a website for munging all sorts of data and includes a geocoder based on TIGER 2010. The website can be unreliable, requiring an occasional restart, so running a VM is a nice option. The vagrant version upgrades the geocoder to TIGER2012 and is a drop in replacement for Google geocoder requests. To run the DSTK locally
Install vagrant from http://www.vagrantup.com/. Create a directory to hold the vagrantfile, then run the following:
Go to to http://localhost:8080 to start using the DSTK.
Install vagrant from http://www.vagrantup.com/. Create a directory to hold the vagrantfile, then run the following:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vagrant box add dstk http://static.datasciencetoolkit.org/dstk_0.41.box | |
vagrant init dstk | |
vagrant up |
Go to to http://localhost:8080 to start using the DSTK.
Subscribe to:
Posts (Atom)