Wednesday, March 4, 2009

Howto: Open Street Map in Oracle

Open Street Map is crowd sourced street map of the world.  As you would expect, it's a lot of data and the xml data file for the world is around 100gb uncompressed.   There are two ways of getting OSM data into Oracle, you can down load the shape files from CloudMade or GeoFabrik or download the planet_osm file, load it into PostgreSQL/PostGIS and then load it into Oracle.
Loading OSM shape files into Oracle
Loading shape files into Oracle is pretty easy and there are a lot of tools (both freeware and commercial) for loading data such as Oracle's Map Builder, Safe Software's FME, GDAL's ogr2ogr, and even a java based shp2sdo utility from Oracle.  The downside of using shape files is that there are a lot of them.  This requires plenty of of disk space for unzipping files as well as shell scripting to download and load the data into Oracle if you are averse to clicking links or pushing buttons repeatedly.  The other problem is that the field names maybe different from the planet_latest.osm file, which means some examples for configuring services may need additional tweaking. 
Loading OSM into PostgresSQL
The other way is to load the osm data into PostgreSQL/PostGIS using osm2pgsql.  There main advantage is that you get the raw data and your data looks like everyone else's so you can reuse their work without much tweaking.  You also avoid all the link clicking, scripting,  unzipping and loading multiple shapefiles.
There are Windows and debian based linux binaries for osm2pgsql.  I use centos, so I had to build osm2pgsql using howtos from here and here.  I had a couple of problems loading the data and it seemed that the system memory was overwhelmed by the amount of data.  Using the slim option (store temporary data in the database) fixed this problem.   Load the data using osm2pgsql: 
./osm2pgsql -c -s -C -H ang -P 5432 -d osm -E EPSG:4326 -U postgres -W ../planet-latest.osm
Loading the planet_latest.osm file can take a couple of days even on a dual processor server.  It is an awful lot of XML to parse and insert into a database.
Ogr2ogr: vector data convertor
The next step is to move the data from Postgres to Oracle using the gdal utility ogr2ogr.   
There are Windows binaries for ogr2ogr but the minimalist windows executables do not include the OCI driver for Oracle.  For Windows, your best bet is to download the OSGeo4W installer and add the OCI driver to the install list.  The OS X binary from Kyng Chaos lacks OCI support, so you will have to build gdal on OS X.  On linux you will need to build gdal.  To build gdal with OCI, you will need  GEOS, PROJ4, and the Oracle Client installed.  Make sure you have ORACLE_HOME set so that make can get to the Oracle libs.  Remember to add the OCI option when configuring:
% ./configure --with-oci % make % sudo make install
Loading OSM from PostgreSQL to Oracle using ogr2ogr
Using ogr2ogr is straight forward, however there a few things that you will need to do ahead of time.  Ogr2ogr cannot create a new database so you have to set one up before moving the data from PostgreSQL to Oracle. 
1. Create a table space to hold the data.  Make sure autoextend is on to accommodate all the data.
sqlplus system/password
   sqlplus> CREATE TABLESPACE PLANET_OSM_TS \  LOGGING \  DATAFILE 'C:\DATA\PLANET_OSM.DBF' \  SIZE 5G \  AUTOEXTEND ON \   EXTENT MANAGEMENT LOCAL;
2. create a user and assign them to the new tablespace
sqlplus> CREATE USER OSM IDENTIFIED BY OSM DEFAULT TABLESPACE PLANET_OSM_TS;
3. grant the user privileges
sqlplus> GRANT CONNECT,RESOURCE TO OSM;
4. make sure that the PostgreSQL machine can see the Oracle machine.  If you have the Oracle client installed,  add the Oracle SID to the C:\Oracle\product\10.1.0\Client_1\network\ADMIN\tnsnames.ora file:
osm_planet = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = oracleserver)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = osm_planet) ) )
Ogr2ogr makes use of the SID to connect to Oracle.  You can test to see if the connection is working with tnsping:
tnsping osm_planet
Once the database is set up, moving OSM from PostgreSQL to Oracle is straight forward.   I did hit one snag.  Before translating the data, make sure that the applications using this data are not tied to a specific Oracle SRID.  For example, the OSM data was in WGS84 (EPSG:4326) and an application I used tied WGS84 to Oracle SRID 8307.  Ogr2ogr uses the first exact match of WKT (Well Known Text) as the target SRID; this happened to be SRID 8192.  The WKT for 8192 and 8307 are exactly the same, but the application was hardwired to use SRID 8307 , so my sql requests failed because the SRIDs did not match.  To avoid problems like this, you can use the -lco (layer creation option) to set the SRID or any other parameters.  
ogr2ogr -f OCI OCI:osm/osm PG:"host=localhost user=postgres password=postgres dbname=osm port=5432" -lco OVERWRITE=yes -lco SRID=8307 -nln planet_osm_line planet_osm_line
Rinse, lather and repeat for planet_osm_roads, planet_osm_point, and planet_osm_polygon layers, if so desired.  Loading the data will take most of the day, particularly planet_osm_line table which contains around 22 million records.

2 comments:

  1. Okay ... there is a catch here ... these instructions roll the line layer in as a 3-d geometry with elevations of 0. To add insult to injury oracle 11 has a spatial index bug for 3-d geometries that has only been patched in the linux release. So I had to cast the 3-d geoms to 2-d .. on the web some kind soul wrote a to_2d function. Search for to_2d and geometry and oracle. Or follow this link: http://forums.oracle.com/forums/thread.jspa?messageID=960492&#960492

    ReplyDelete
  2. You can also add "-lco DIM=2" to the ogr2ogr command to avoid the 3D geometry.

    ReplyDelete