Sunday, July 31, 2011

Loading shapefiles into Elasticsearch

Elasticsearch is an open source (Apache 2), distributed, RESTful, search engine built on top of Lucene. Elasticsearch supports point data as a geo_point data type; which provides several types of spatial search:
Elasticsearch provides a REST API for configuring, loading, and querying data. It also has a bulk loading interface. To load shapefiles into elastic search I wrote a ruby script to convert shapefiles to Elasticsearch's bulk loading format. Here's the process for loading multiple shapefiles as multiple types into a single index (think of an index as a 'database' and types as 'tables').

If you haven't installed Elasticsearch, download it, unzip it, and read the README file to start the node.


1. Create the index
curl -XPUT 'http://localhost:9200/geo/'
2. Create a mapping file (place.json) in your favorite editor. This example is for a shapefile called place
{
    "place" : {
        "properties" : {
            "geometry": {
                "properties": {
                    "coordinates": {
                        "type": "geo_point"
                    }
                }
            }
        }
    }
}
curl -XPUT http://localhost:9200/geo/place/_mapping -d @place.json
3. Convert the shapefile to the ES bulk format.
ruby shapefile2elasticsearch.rb > place_data.json
4. Bulk load the data.
curl -XPUT 'http://localhost:9200/_bulk/' --data-binary @place_data.json
5. Test query.
curl -XGET 'http://localhost:9200/geo/place/_search' -d '{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "geo_distance": {
                    "distance": "20km",
                    "geometry.coordinates": {
                        "lat": 32.25,
                        "lon": -97.75
                    }
                }
            }
        }
    }
}'
6. Add a second type definition for zipcodes:
{
    "zipcode" : {
        "properties" : {
            "geometry": {
                "properties": {
                    "coordinates": {
                        "type": "geo_point"
                    }
                }
            }
        }
    }
}
curl -XPUT http://localhost:9200/geo/place/_mapping -d @zipcode.json
7. Convert and bulk load the data.
curl -XPUT 'http://localhost:9200/_bulk/' --data-binary @zipcode_data.json
8. Test query.
curl -XGET 'http://localhost:9200/geo/zipcode/_search' -d '{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "geo_distance": {
                    "distance": "20km",
                    "geometry.coordinates": {
                        "lat": 32.25,
                        "lon": -97.75
                    }
                }
            }
        }
    }
}'