Thursday, September 11, 2014

Useful tools: Oracle SQL Developer Data Modeler

Oracle SQL Developer Data Modeler is a useful tool for database design that supports building logical and physical models.

To run in OSX Mountain Lion, it needs Java1.7 for OSX.

Saturday, July 5, 2014

Going full "Get of my lawn, damn kids!"

I still love twitter because it brings me moments like these:

This is me waving my cane around in the air from the rocking chair. But, there's a reason for this to exist!

My turn. After I adjust my Depends.

Slamming down my Ensure, I whip out this witty rejoinder:

Monday, May 5, 2014

Big for Ignite style talks

Twenty slides in 5 minutes, except not in PowerPoint or Keynote, just HTML and javascript using Tom Macwright's big. From my Open Ignite talk.

Tuesday, December 31, 2013

What's HOT for the GeoHipster in 2014


Skybox Imaging and Planet Labs have launched imaging satellites, expect bunch of cool new image products and imagery derived data in 2014. Also note that Frank Warmerdam is at Planet Labs. 

But wait, there's more! There's another readily available source of imagery data, it's in the photos people are posting to Instagram, Flickr and Facebook. Expect tools to exploit this source of imagery.

Hardware hacking

Arduino and Raspberrypi are moving out of their respective blinky lights infancy. Geohipsters will be connecting them to sensors and talking to via node.js. Expect to see other hardware platforms such as Tessel making inroads on the hardware hacking movement. 

Car hacking is still in it's infancy with blue tooth ODBII modules. But as more cars roll out as mobile platforms replete with a API, car modding will be more than just chip modding for performance.

Thursday, March 21, 2013

A little data for geocoding

What's a geocoder to do without data? Fortunately, there's tons of it and more and more produced every day. I have a project where I need to verify the addresses of non-profits. The IRS provides the Statement of Information (SOI) Tax Statistics for Exempt Organizations Business Master File Extract. The data is provided as both Excel files and as fixed width delimited text files. The fixed width files contain all the records and there is one per state.

Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page. 

Below is the script for importing the data.

When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.

This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.

Saturday, February 2, 2013

Data Science Tool Kit on Vagrant

Pete Warden has released a version of the Data Science Tool Kit on Vagrant. DSTK is a website for munging all sorts of data and includes a geocoder based on TIGER 2010. The website can be unreliable, requiring an occasional restart, so running a VM is a nice option. The vagrant version upgrades the geocoder to TIGER2012 and is a drop in replacement for Google geocoder requests. To run the DSTK locally

Install vagrant from Create a directory to hold the vagrantfile, then run the following:

Go to to http://localhost:8080 to start using the DSTK.

Tuesday, November 13, 2012

Hacking a Common Operating Picture

Where’s the common operational picture? How can you get timely and accurate reporting without it?"

I've assisted on several disaster relief efforts since Hurricane Katrina and they have all needed a way to communicate where resources are and where they are needed. In military jargon, this is called a common operating picture or a COP. The most common interface for a COP is a map because it is easy to reference, although there is a case to made for a COP that also shows resources and needs by priority. 

Google Maps has made mapping accessible to the public. I first saw this during Katrina when people used Google Earth to link photos to locations in KMLs so that Katrina refugees could see flooded and damaged areas. I've also seen the availability of other mapping platforms and services such as Ushahidi and CartoDB increase, but it seems most people turn to Google Maps/Earth to create maps and data. While Google Earth has lowered the barrier to creating maps, it still requires a modicum of training in order to keep the map updated with the latest information.

I responded a request to help out with Occupy Sandy Staten Island with their map of resources. They had a map but things were rapidly changing and they wanted a volunteer to be able to update the map. The current map was in Google Maps, so I downloaded a KML of the map as a starting point. 

Consistent Look and Feel
Although there are many mapping platforms available, I decided to stick with Google infrastructure because Occupy Sandy Staten Island were already using it. To make the KML editable, I imported it into Google Fusion Tables which brought in the data but not the styling. I wanted the icons and style for each zone (polygonal area) to be consistent with the existing maps. That means that drop off points, shelters, and operations centers would always have the same symbology without the user having to pick and choose an icon or style every time a new point was added to the map.

Google Fusion Tables supports setting styles based on a column value. Google provides a tutorial showing how to merge tables to apply map styles by column. The tutorial shows how to create a columns in both the styles table and the data table that act as a key. In this case, I created a column called type and populated them descriptive keys such as dropoff, shelter, etc.  Although the tutorial shows how to set icons, map styles for polygons and lines can also be styled by column values. Merging style table to the data table results in a data table, row updates (new entries) to either the styles or data table will be displayed in the merge table. In this case, volunteers can just enter the type of point and it will be displayed with the correct symbology.

One of the primary functions of a COP is to locate things. Fusion table supports several types of geometry encoding ranging from KML geometry to street addresses. Street address are automatically converted into geometry. Even a partial street address without city, state, or zipcode are intelligently gecoded based on the bounding box of all the features.

Adding lines or polygonal data requires digitizing and is a bit beyond the scope. I did add one polygonal area using QGIS and OpenStreet Map as a background.  I originally exported the polygon as a KML, but for some reason QGIS outputs KML geometry in Google's World Mercator instead of WGS84 latitude/longitude coordinates in decimal degrees which is what Fusion Tables wants in the way of geography. The workaround was to export as GML and tweak the tags appropriately.

I wanted to have a spreadsheet interface for updating the map. Why a spreadsheet and not a form or just use Fusion Tables tables directly? Most people have used a spreadsheet so the learning curve is very small. In my experience forms get in the way of data entry, having to submit an entry is just a waste of time. Fusion Tables looks like a spreadsheet but it doesn't behave like one, so that adds to the learning curve. 

John McGrath wrote a script to update Fusion Tables from a spreadsheet; it's available on github. The script works but has shortcomings such as deleting all the entries in the Fusion Table before updating it. That means that you must keep and select all the previous entries in the spreadsheet instead of just updating the new entries.

Closing Comments
This was a quick hack done over a several hours. It's certainly far from perfect but it is functional and accomplishes the basic goal of creating a consistant map that can be updated by volunteers. Here are some takeaways:
  • use existing infrastructure; try to avoid having to install, configure or maintain software
  • use existing code; if you need a function, it's probably in github already
  • make data input simple; as a volunteer translator for Project 4636, I found forms getting in the way of entering information, so I used a spreadsheet
  • make data input convenient; for example, Fusion Tables can geocode an address automatically

Monday, September 10, 2012

Code for America Austin Hackathon: Buses and fun with GTFS

Last Saturday, I participated in the Code for America Austin Hackathon and was impressed by the people who attended. The crowd ranged from very experienced coders to folks who understood and could articulate a real problem as well as policies surrounding the problem. When I said well attended, did I mention that the City of Austin CIO and some of his staff were also there to help out? (Wink, wink, nudge, nudge, say no more to @COSAGOV and rising Democratic political star San Antonio Mayor Julian Castro @juliancastro).

Projects proposed at the hackathon ranged from automating campaign finance forms for the City of Austin, mapping bike routes and amenities, creating a mobile app and API for the  Aunt Bertha service which matches people to services, an online form for clients of the Homeless RV project, and improving bus transportation for riders.

I fell in with the bus group and the problem was presented by Glenn Gabois of Movability Austin. The problem we focused on was notifying riders the location and time of arrival of their bus while they wait at their stop.  Capital Metro, Austin's transportation agency, currently does not provide a feed for their buses while they are in transit. So we divide the problem into three componentes:
  • a crowd sourced bus tracking application
  • the server side infrastructure to provide bus data
  • a mobile application that tells rider a bus' current stop and the number of stops away
Crowd sourcing real time bus locations was an interesting discussion, the group discussed how a mobile app could be used to track buses. Some of the features of the app included:

  • would not require user to turn while on the bus or turn off when off the bus
  • determine which stop where the user boarded the bus
  • disambiguate between buses that used the same stops
The group discussion resulted in a mobile application that ran as a background process on a smart phone that would start sending coordinates when the accelerometer registered a change in speed. Determining which bus a rider boarded was a tougher problem and solutions ranged from QR codes on buses (rejected because it required the user to interact with the application), RFID (problematic because of equipment costs), and NFC (Near Field Communication is relatively new an not a common features on smart phones). In the end, the group centered on algorithmically determining the bus and the route based on data sent by the mobile application. Although this was an interesting discussion, the group decided not to address this problem

In order to use crowd sourced bus tracking data, the group discussed a service architecture to receive the bus tracking coordinates and compare them against the current transit data which Capital Metro provided as GTFS (General Transit File Specification) and as ESRI shape files.  We didn't spend much time discussing this part of the solution and I didn't feel up to the task of setting up a cloud based instance of PostgreSQL and importing the GTFS data into the data base (however, I later proved myself wrong and imported the GTFS data in PostgreSQL and created a sql dump of the database for anyone who wants to use it). For a quick feel good burst of "we're doing something", I uploaded the shape file data to Google Fusion Tables for a quick map. We had five hours to build and demo something at the end of the day.

The group decided to focus on the bus stop application (working title of Bus Time). Naqi Sayed and Kevin Reilly of Hidaya Solutions (@HidayaSolutions) worked on an Android application for determining how many stops away from a rider's current bus stop. With the clock ticking, the group decided that the application would work of just one trip of a single route. My contribution to the group was to rework the GTFS files into a single workable data sets. The GTFS data consisted of the following files

Using the GTFS Reference, I was able to decode how to get from a route to a list of stops. In our case, we picked a route and a trip on that routed, we then needed to find the stops for that trip. Since stops are sequential we just needed a sorted list of stops of the application.

To get the list of stops, I went through the GTFS Reference to find the keys that would allow me to associated stops with a specific trip. The joins between the data tables looked like this.

  1. Routes are comprised of trips (northbound, southbound, and time). Select a route from routes.txt, then select a trip by the route_id.
  2. A single trip list stop_times. Select stop_times by trip_id.
  3. Stop_times are related to stops by stop_id in the stops.txt file. Select the the stops using the stop_id.
I produced a csv file that was an ordered list of the stops for the application using a combination of sed/grep/awk/bash/ruby scripting. It wasn't pretty but in worked under our time constraint.

Naqi already had the barebones of an Android app with geolocation so he incorporated the data into the app. I did not work on the app but I think it worked this way:
  • the app determined the location of the rider via WiFi location or GPS and choose the closest stop
  • the bus position feed was simulated and was snapped to the closest stop
  • the app compared the rider's stop to the bus' stop, since it was an ordered list it was matter of counting the between the two stops
In Lean Startup speak, the minimum viable product (MVP) was this application.

To me, it was amazing that a group of people who had just met could hammer out an application in six hours. Everyone brought something to the table that contributed to the overall goal. I learned lots of new things from other folks and also some technical lessons along the way, such as importing GTFS data into a database using these scripts

We did what we did because we had to explore a problem, generate solutions, and implement one of then in six hours. Now that I've had a little more time to think about it, I think we could vastly improve the application by importing the GTFS data into Sqlite and using it directly in the application. The application would then have every route, trip, fares, and amenities available. The application could then just subscribe to real-time transit feed without having to deploy application back end. Additionally, we could revisit the QR code idea. Instead of riders with a crowd sourcing app, the QR code would be on the bus and the reader would be at each stop, as buses load and unload passengers the QR code reader could send the bus information to the real-time transit feed.

As you can see the ideas keep flowing, but it's important to remember that creating a mobile application would have been impossible with open data in the form of GTFS. 

Saturday, August 11, 2012

Notes on getting CartoDB VM working in OSX VirtualBox

I've been wanting to play some more with CartoDB but I've been traveling and my Internet connection has been provided courtesy of whatever cell tower is close by. A VMWare Ubuntu virtual machine is available at aux 2 centimes. Great if your operating system is Windows or linux because VMWare provides a free player to run the VM. On OSX, VMware Fusion can run the VM but it's not free.

The free alternative is Virtual Box, but it requires converting the VMWare vmdk file to an the open ovf format. Before converting the vmdk file, you will need to use your favorite OSX rar archive tool (MacPar, UnRarX) to reassemble the CartoDB vm which is split into three rar files. 

Download the ovftool from VMWare and install. Before converting the vmdk to an ovf file you will need to edit the vmx file so that the path to the vmdk is not hardcoded.

Convert the vmdk to an ova file.

Import the ova into VirtualBox and accept the default settings, they can be changed later.

The next step is to configure the VirtualBox network so that CartoDB is accessible to your host machine. To do this setup the Host-only Adapter. The Host-only Adapter creates a private network that between the host machine and guest machines. 

The vboxnet0 Host-only Adapter is configured from the VirtualBox Preferences menu.

Click on the 'Edit' or screwdriver icon to display the configuration menu. Note that the adapter's address is, which is the IP address that the guest machine accesses the host.

Now that we know the adapter's address, we can configure the network interface of the guest to have a static IP that is addressable from the host machine. Edit the /etc/network/interfaces file as root to set the static address. It should look like this:

Reboot the guest to start the network with the assigned static IP.

The CartoDB virtual machine is configured with two accounts, a devuser account and an admin account. To access these accounts, two URLs have been configured: http://devuser.localhost.lan (user/password: devuser/devuser) and http://admin.localhost.lan (user/password: admin/admin). 

To access these accounts configure the /private/etc/hosts file on the host side (OSX) by adding these entries:

After updating the hosts file, we can access our local instance of CartoDB through a browser. Note: add 'http://' to the URL so that the browser knows you are looking for the local CartoDB site. The devuser map looks like this:

Happy mapping!

Friday, August 3, 2012


A friend in grad school used to go around the office and exhort us to 'go eat' lunch together. Eventually this just became "gweet."

Similarly, an engineer I worked with would build applications using java jars built by another part of our company. I would ask him how his project was going after the jar update and he would invariably say "It's broke." Over time, this elided into 'sproke.

*True story: One Christmas, I bought everyone a computer bag with NPE embroidered on it because everything was 'sproke.