Wednesday, March 4, 2009

Howto: Open Street Map in Oracle

Open Street Map is crowd sourced street map of the world.  As you would expect, it's a lot of data and the xml data file for the world is around 100gb uncompressed.   There are two ways of getting OSM data into Oracle, you can down load the shape files from CloudMade or GeoFabrik or download the planet_osm file, load it into PostgreSQL/PostGIS and then load it into Oracle.
Loading OSM shape files into Oracle
Loading shape files into Oracle is pretty easy and there are a lot of tools (both freeware and commercial) for loading data such as Oracle's Map Builder, Safe Software's FME, GDAL's ogr2ogr, and even a java based shp2sdo utility from Oracle.  The downside of using shape files is that there are a lot of them.  This requires plenty of of disk space for unzipping files as well as shell scripting to download and load the data into Oracle if you are averse to clicking links or pushing buttons repeatedly.  The other problem is that the field names maybe different from the planet_latest.osm file, which means some examples for configuring services may need additional tweaking. 
Loading OSM into PostgresSQL
The other way is to load the osm data into PostgreSQL/PostGIS using osm2pgsql.  There main advantage is that you get the raw data and your data looks like everyone else's so you can reuse their work without much tweaking.  You also avoid all the link clicking, scripting,  unzipping and loading multiple shapefiles.
There are Windows and debian based linux binaries for osm2pgsql.  I use centos, so I had to build osm2pgsql using howtos from here and here.  I had a couple of problems loading the data and it seemed that the system memory was overwhelmed by the amount of data.  Using the slim option (store temporary data in the database) fixed this problem.   Load the data using osm2pgsql: 
./osm2pgsql -c -s -C -H ang -P 5432 -d osm -E EPSG:4326 -U postgres -W ../planet-latest.osm
Loading the planet_latest.osm file can take a couple of days even on a dual processor server.  It is an awful lot of XML to parse and insert into a database.
Ogr2ogr: vector data convertor
The next step is to move the data from Postgres to Oracle using the gdal utility ogr2ogr.   
There are Windows binaries for ogr2ogr but the minimalist windows executables do not include the OCI driver for Oracle.  For Windows, your best bet is to download the OSGeo4W installer and add the OCI driver to the install list.  The OS X binary from Kyng Chaos lacks OCI support, so you will have to build gdal on OS X.  On linux you will need to build gdal.  To build gdal with OCI, you will need  GEOS, PROJ4, and the Oracle Client installed.  Make sure you have ORACLE_HOME set so that make can get to the Oracle libs.  Remember to add the OCI option when configuring:
% ./configure --with-oci % make % sudo make install
Loading OSM from PostgreSQL to Oracle using ogr2ogr
Using ogr2ogr is straight forward, however there a few things that you will need to do ahead of time.  Ogr2ogr cannot create a new database so you have to set one up before moving the data from PostgreSQL to Oracle. 
1. Create a table space to hold the data.  Make sure autoextend is on to accommodate all the data.
sqlplus system/password
   sqlplus> CREATE TABLESPACE PLANET_OSM_TS \  LOGGING \  DATAFILE 'C:\DATA\PLANET_OSM.DBF' \  SIZE 5G \  AUTOEXTEND ON \   EXTENT MANAGEMENT LOCAL;
2. create a user and assign them to the new tablespace
sqlplus> CREATE USER OSM IDENTIFIED BY OSM DEFAULT TABLESPACE PLANET_OSM_TS;
3. grant the user privileges
sqlplus> GRANT CONNECT,RESOURCE TO OSM;
4. make sure that the PostgreSQL machine can see the Oracle machine.  If you have the Oracle client installed,  add the Oracle SID to the C:\Oracle\product\10.1.0\Client_1\network\ADMIN\tnsnames.ora file:
osm_planet = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = oracleserver)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = osm_planet) ) )
Ogr2ogr makes use of the SID to connect to Oracle.  You can test to see if the connection is working with tnsping:
tnsping osm_planet
Once the database is set up, moving OSM from PostgreSQL to Oracle is straight forward.   I did hit one snag.  Before translating the data, make sure that the applications using this data are not tied to a specific Oracle SRID.  For example, the OSM data was in WGS84 (EPSG:4326) and an application I used tied WGS84 to Oracle SRID 8307.  Ogr2ogr uses the first exact match of WKT (Well Known Text) as the target SRID; this happened to be SRID 8192.  The WKT for 8192 and 8307 are exactly the same, but the application was hardwired to use SRID 8307 , so my sql requests failed because the SRIDs did not match.  To avoid problems like this, you can use the -lco (layer creation option) to set the SRID or any other parameters.  
ogr2ogr -f OCI OCI:osm/osm PG:"host=localhost user=postgres password=postgres dbname=osm port=5432" -lco OVERWRITE=yes -lco SRID=8307 -nln planet_osm_line planet_osm_line
Rinse, lather and repeat for planet_osm_roads, planet_osm_point, and planet_osm_polygon layers, if so desired.  Loading the data will take most of the day, particularly planet_osm_line table which contains around 22 million records.

Monday, March 2, 2009

Transparency Camp09: Mapping Session

Transparency Camp is a bar camp for open government advocates (government representatives, technologists, developers, NGOs, wonks and activists) to share knowledge on how to use new technologies to make our government transparent and meaningfully accessible to the public. Andrew Turner of GeoCommons and I ran a session on mapping, public participation and open data. Andrew avoided the "powerpoint" approach did a wonderful job of moderating the session and made it very interactive. He captured the discussion by dividing it into two sections Mapping and GeoData and Solutions.  Mapping and GeoData was further divided into two categories: Difficulties and Goals. Solutions was the beginning of a mind map I've summarized the session roughly according to Andrew's categories of Difficulties and Goals. Data input and output The conversation started out with common frustrations which can be divided into two problems: getting data in and getting maps out. Getting data into a mapping interface remains problematic. The session members commonly used csv, shape files and geotiffs to overlay on a base map source such as Google Maps. While data is available from the web sites and from government agencies, the problem is that the data is poorly described. Data sources from different government offices are unique bit contain common data. In addition, the usefulness of a particular data set is not known until they have completed the entire process of building a map. The primary output problem was printing maps. On-line mapping applications do not provide an easy way to create printed maps, especially large format (D or E size) maps. There was also a concern about licensing and copyright when printing maps from an on-line source such as Google Maps. 'Easy' was the word of day and the goal with regards to data input and out put. The ability to preview the data or even have meaningful metadata (i.e. fitness of use ranking, popularity, etc) was needed. Also the problem of having multiple schemas of similar data could be addressed with a common community schema. To make it easy, there could be a graphical tool that lets one map a data set to the community schema by simply drawing lines between two fields. Motivations Session members said that their motivation for mapping was to tell a story and that maps were a way to tie data to communities. Maps make data real and concrete and inherently provoke a visceral reaction. Maps are used as an exploratory interface so that patterns of data can be revealed. Maps are used as a way to plan, coordinate and share information through a variety of contexts. A distinction was made about different contexts; for example a map showing the locations of services did not work because it didn't readily answer the question of "how close is the service." In that case proximity was more important actual location. Andrew summed it up nicely, "Geographic doesn't always mean cartographic." Although, most members agreed that visualization of location was the primary use of maps, it was obvious that they also used maps in a more sophisticated way to communicate the implications of data. Current web mapping platforms remain focused on the "where is" aspect of maps. The next generation web mapping platforms should implement the basic cartographic thematic forms of isopleths, chloropleths, dot density, and proportional symbols. Open data and fear A good part of the discussion was about open data and why organizations did not want to release it.  Fear was the main reason cited for not releasing open data; below is a list of fears:
  • liability
  • may expose problems with the data
  • data may be used against an organization
  • some organizations fear that it will reveal patterns of behavior that will be criticized
  • concerns about real time location/tracking of archeological sites or endangered species
  • security and privacy are of concern
What they really want
To summarize, here is want they wanted:
easy 
  • data preview 
  • a common way to view data from different sources (a normalization tool)
  • making sophisticated maps guided by wizards
  • printed map output, or press ready map output for brochures
cheap 
  • the applications should be low cost or free
  • require minimal staff training
  • the amount of time spent on maintenance should be low
  • use a minimal amount of bandwidth (interesting comment that hints at an interest in hosted solutions)