I can't remember how many times I've ranted about dumb users sending an Excel file as a database. After watching Max Schireson's presentation on Reinventing the Database, I am shamed. The gist of Schireson's presentation is that we spent a lot of time strongly and parsimoniously typing data to accomodate the technological constraints of computers when memory and disk were expensive. Obviously, those constraints are no longer valid.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
start_x, start_y, end_x, end_y, street_name, side, tree1_distance, tree1_width, tree1_height, tree1_species, ... , treeN_distance, treeN_width, treeN_height, treeN_species |
Collecting tree data along a street in the form of a list is a very natural way for volunteers to gather information. A street is the basic entity that contains trees and treebeds and it constitutes a single record. However, data in this format violates first normal form in that holding the trees along with the street data creates repeating groups with in a single record.
In an ideal relational database there would be a street table, a treebed table, and a tree table with one-to-many relationships defined between the tables. However, the data is entered into a table with repeating groups. An external java process (using GeoTools) calculates the coordinates of trees and treebeds and inserts them into Postgres/PostGIS tables so they can be displayed. So the mapping stack in this case is Postgres with a TileMill frontend for creating maps, which is served through MapBox.
In this architecture, Postgres is grossly underutilized. Sure it stores both the raw data and the spatial data but that's all it does. The external java process that creates the spatial data is legacy code (written in an earlier iteration) can be written in something else and in fact is replicated in the data entry form in javascript. In it's simplest formulation, the data can be expressed in JSON.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{ | |
"street": { | |
"name": "abc", | |
"startx": "123", | |
"starty": "123", | |
"endx": "123", | |
"endy": "123", | |
"side": "left" | |
}, | |
"treebeds": { | |
"treebed": { | |
"order": "1", | |
"distance": "123", | |
"width": "123", | |
"height": "123", | |
"species": "xyz" | |
} | |
..., | |
"treebeds": { | |
"treebed": { | |
"order": "N", | |
"distance": "N", | |
"width": "123", | |
"height": "123", | |
"species": "xyz" | |
} | |
} | |
} | |
] |
There are situations where Postgres/PostGIS would be advantageous, such as where data changes or where additional processing or sophisticated querying is needed. Storing the data in simple format that can be easily consumed by most web applications has several advantages over storing it in a database. First, GeoJSON can be consumed directly my many web mapping clients; second, the overhead of maintaining another piece of software is removed; and finally the data can be transformed into other formats easily. So if someone hands you a Excel or csv file, try using a simpler format such as GeoJSON and simplify your web mapping stack.