Sunday, May 8, 2016

Spatial Data Processing with Docker

or otherwise known as:

I started using docker when I first tried to build accumulo and geomesa on my MacBook, and like many projects involving compiling native binaries, it was a nightmare. I found an image of accumulo and geomesa on Docker Hub which I was able to use immediately. One of the barriers to adopting open source is the ability to run it on the system you have. Docker makes this possible with a minimum of effort.

Before diving into making geo great again, here are a few terms and definitions to establish a common vocabulary. 

  • Dockerfile: This tells the image builder (i.e Jenkins) what the image should look like. 
  • Image: The basis of a Docker container at rest. These artifacts are stored and managed in a registry. Once instantiated via a Docker run command a container is created. 
  • Container: The standard unit in which the application service resides. At run, the image is turned into a container. 
  • Docker Engine: Installed on physical, virtual or cloud hosts, this lightweight runtime is what pulls images, creates and runs containers. 
  • Registry: A service where Docker images are stored, managed and distributed. 
Here's the tl;dr version. A Dockerfile is used to create an image, then an image running it's called a container, the Docker Engine is what is used to run an image, you can find images in the Docker Hub registry. Got it?

Here's the difference between a virtual machine and a container.

The important takeaway is that virtual machines use an entire operating system to run an application. A container is for all practical processes a compiled binary that runs like any other native application on your operating system.

You can install docker on Linux, OSX, Windows and cloud. If you just want to experiment, I encourage you to join the beta program.

So you've installed docker –

Let's do something useful, yet familiar; which is run gdal on docker.

docker run geodata/gdal

Since docker couldn't find the image locally, it downloaded the image and when it ran, it automatically ran gdalinfo to show that it was running. This is all very well and interesting but since there are binaries for gdal for most operating systems this isn't very exciting. So let's add a few more applications that will help with a popular geoprocessing task.

There has been quite a number of posts and recipes for creating natural color pan-sharpened images from Landsat. Most of these involve downloading and compiling several open source tools. For this example, we will take the the geodata/gdal image and add a few more tools and use a script to perform the image processing. An "enough to be dangerous" level of proficiency in git and linux is sufficient to do this.

First, fork and clone the the geodata/docker git repository from 

Edit the Dockerfile to include dans-gdal-scripts and imagemagick:

# Install the application.
ADD . /usr/local/src/gdal-docker/
RUN apt-get update -y && \
    apt-get install -y make && \
    make -C /usr/local/src/gdal-docker install clean && \
    apt-get purge -y make && \
    apt-get install -y dans-gdal-scripts && \
    apt-get install -y imagemagick

Do the git dance of add, commit, and push your updated Dockerfile. Or you can just fork and clone from the presentation repository.

Build the new image:

docker build -t spara/gdal:local git://

Next we will use an existing script to convert a Landsat 8 image to a pansharpened natural color jpeg suitable for framing. To use our new extra fancy docker image, we'll modify to script which can be found here. Note that I tweaked the settings to provide a brighter image.

Let's test out our new tool, for comparison we'll use a NASA tutorial and the data they used to make a natural color image. Here's the image from the tutorial:

and here's the image from our extra fancy docker gdal image.

I think that the image they used in their article is not the same as the link to their test image since that looks like a lot more snow cover and that this image is zoomed in in comparison to the NASA image.

With as little work as possible, we've ‘leveraged’ the work others by using and existing container and scripts, added more capabilities to a single image and created a single purpose tool that can deployed and reused anywhere. The image is availabe from Docker Hub.

docker pull spara/gdal_ef

We've built a tool that is reusable across any operating system and can be used as a 'Lego' brick when composing a data processing workflow. This is how we're going to make geo great again.

However, you don't even need to do this since there are already many geo tools available on Docker Hub

I'll end this with a quote

Wednesday, July 1, 2015

Sunday, September 28, 2014

Loading JSON-LD Into Elasticsearch

From the elasticsearch mailing list

Amine Bouayad amine@***.com via 

Thank you all for your responses and interesting conversation about RDF serialization into ES. With regards to my original post, I ended up using a solution based on RDFlib: 

It works as expected, and compacting the content by using @context does the trick and is flexible. It is an in-memory process however, which could be an issue for those with very large RDF files. When using Jena, I didn't find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support for XSD literals and lists, so perhaps it could be extended to map directly into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another question: are there utilities to bulk load such documents (the json-ld contains individual documents per ES, each with an _id), or do I just write a script that calls curl -XPUT for each record in the json-ld file? Seems like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending an existing solution.


Saturday, September 27, 2014

ello protip: mp4 to animated gif using ffmpeg

Ello doesn't support videos yet, so animated gifs are the way to go. If you have brew installed you can just install ffmpeg:

~ brew install ffmpeg

To convert a video to gif with ffmpeg:

~ ffmpeg -i myvideo.mp4 -vf scale=320:-1 -t 10 -r 10 myvideo.gif

-t sets the time of the video
-r sets the number of frames per second

And there are bunch of other parameters:

Global options (affect whole program instead of just one file:
-loglevel loglevel  set logging level
-v loglevel         set logging level
-report             generate a report
-max_alloc bytes    set maximum size of a single allocated block
-y                  overwrite output files
-n                  never overwrite output files
-stats              print progress report during encoding
-max_error_rate ratio of errors (0.0: no errors, 1.0: 100% error  maximum error rate
-bits_per_raw_sample number  set the number of bits per raw sample
-vol volume         change audio volume (256=normal)

Per-file main options:
-f fmt              force format
-c codec            codec name
-codec codec        codec name
-pre preset         preset name
-map_metadata outfile[,metadata]:infile[,metadata]  set metadata information of outfile from infile
-t duration         record or transcode "duration" seconds of audio/video
-to time_stop       record or transcode stop time
-fs limit_size      set the limit file size in bytes
-ss time_off        set the start time offset
-timestamp time     set the recording timestamp ('now' to set the current time)
-metadata string=string  add metadata
-target type        specify target file type ("vcd", "svcd", "dvd", "dv", "dv50", "pal-vcd", "ntsc-svcd", ...)
-apad               audio pad
-frames number      set the number of frames to record
-filter filter_graph  set stream filtergraph
-filter_script filename  read stream filtergraph description from a file
-reinit_filter      reinit filtergraph on input parameter changes

Video options:
-vframes number     set the number of video frames to record
-r rate             set frame rate (Hz value, fraction or abbreviation)
-s size             set frame size (WxH or abbreviation)
-aspect aspect      set aspect ratio (4:3, 16:9 or 1.3333, 1.7777)
-bits_per_raw_sample number  set the number of bits per raw sample
-vn                 disable video
-vcodec codec       force video codec ('copy' to copy stream)
-timecode hh:mm:ss[:;.]ff  set initial TimeCode value.
-pass n             select the pass number (1 to 3)
-vf filter_graph    set video filters
-b bitrate          video bitrate (please use -b:v)
-dn                 disable data

Audio options:
-aframes number     set the number of audio frames to record
-aq quality         set audio quality (codec-specific)
-ar rate            set audio sampling rate (in Hz)
-ac channels        set number of audio channels
-an                 disable audio
-acodec codec       force audio codec ('copy' to copy stream)
-vol volume         change audio volume (256=normal)
-af filter_graph    set audio filters

Subtitle options:
-s size             set frame size (WxH or abbreviation)
-sn                 disable subtitle
-scodec codec       force subtitle codec ('copy' to copy stream)
-stag fourcc/tag    force subtitle tag/fourcc
-fix_sub_duration   fix subtitles duration
-canvas_size size   set canvas size (WxH or abbreviation)

-spre preset        set the subtitle options to the indicated preset

Thursday, September 11, 2014

Useful tools: Oracle SQL Developer Data Modeler

Oracle SQL Developer Data Modeler is a useful tool for database design that supports building logical and physical models.

To run in OSX Mountain Lion, it needs Java1.7 for OSX.

Saturday, July 5, 2014

Going full "Get of my lawn, damn kids!"

I still love twitter because it brings me moments like these:

This is me waving my cane around in the air from the rocking chair. But, there's a reason for this to exist!

My turn. After I adjust my Depends.

Slamming down my Ensure, I whip out this witty rejoinder:

Monday, May 5, 2014

Big for Ignite style talks

Twenty slides in 5 minutes, except not in PowerPoint or Keynote, just HTML and javascript using Tom Macwright's big. From my Open Ignite talk.

Tuesday, December 31, 2013

What's HOT for the GeoHipster in 2014


Skybox Imaging and Planet Labs have launched imaging satellites, expect bunch of cool new image products and imagery derived data in 2014. Also note that Frank Warmerdam is at Planet Labs. 

But wait, there's more! There's another readily available source of imagery data, it's in the photos people are posting to Instagram, Flickr and Facebook. Expect tools to exploit this source of imagery.

Hardware hacking

Arduino and Raspberrypi are moving out of their respective blinky lights infancy. Geohipsters will be connecting them to sensors and talking to via node.js. Expect to see other hardware platforms such as Tessel making inroads on the hardware hacking movement. 

Car hacking is still in it's infancy with blue tooth ODBII modules. But as more cars roll out as mobile platforms replete with a API, car modding will be more than just chip modding for performance.

Thursday, March 21, 2013

A little data for geocoding

What's a geocoder to do without data? Fortunately, there's tons of it and more and more produced every day. I have a project where I need to verify the addresses of non-profits. The IRS provides the Statement of Information (SOI) Tax Statistics for Exempt Organizations Business Master File Extract. The data is provided as both Excel files and as fixed width delimited text files. The fixed width files contain all the records and there is one per state.

Using the same technique I used for importing 2010 Census headers, I imported each line/record as a single field into a temporary table. Using the SQL substring function, I extracted the data into their respective fields. Information about the file structure and fields are available in The Instruction Booklet on the download page. 

Below is the script for importing the data.

When all is said and done, you will have a table containing California tax exempt organizations. The next step may seem a little backward, but I exported the data back to a tab delimited text file.

This may seem a step backward, but until there is a built in Postgres geocoder, handling text files is simpler and faster than write code that extracts data for geocoding using an external service.

Saturday, February 2, 2013

Data Science Tool Kit on Vagrant

Pete Warden has released a version of the Data Science Tool Kit on Vagrant. DSTK is a website for munging all sorts of data and includes a geocoder based on TIGER 2010. The website can be unreliable, requiring an occasional restart, so running a VM is a nice option. The vagrant version upgrades the geocoder to TIGER2012 and is a drop in replacement for Google geocoder requests. To run the DSTK locally

Install vagrant from Create a directory to hold the vagrantfile, then run the following:

Go to to http://localhost:8080 to start using the DSTK.