This page introduces how to run Hivemall on Docker.

Caution

This docker image contains a single-node Hadoop enviroment for evaluating Hivemall. Not suited for production uses.

Requirements

  • Docker Engine 1.6+
  • Docker Compose 1.10+

1. Build image

Build using docker-compose

docker-compose -f resources/docker/docker-compose.yml build

Build using docker command

docker build -f resources/docker/Dockerfile .

Note

You can skip building images by using existing Docker images.

2. Run container

Run by docker-compose

  1. Edit resources/docker/docker-compose.yml
  2. docker-compose -f resources/docker/docker-compose.yml up -d && docker attach hivemall

Run by docker command

  1. Find a local docker image by docker images.
  2. Run docker run -it ${docker_image_id}. Refer Docker reference for the command detail.

Running pre-built Docker image in Dockerhub

  1. Check the latest tag first.
  2. Pull pre-build docker image from Dockerhub docker pull hivemall/latest:20170517
  3. docker run -p 8088:8088 -p 50070:50070 -p 19888:19888 -it hivemall/latest:20170517

You can find pre-built Hivemall docker images in this repository.

3. Run Hivemall on Docker

  1. Type hive to run (.hiverc automatically loads Hivemall functions)
  2. Try your Hivemall queries!

Accessing Hadoop management GUIs

Note that you need to expose local ports e.g., by -p 8088:8088 -p 50070:50070 -p 19888:19888 on running docker image.

Load data into HDFS (optional)

You can find an example script to load data into HDFS in ./bin/prepare_iris.sh. The script loads iris dataset into iris database.

Build Hivemall (optional)

In the container, Hivemall resource is stored in $HIVEMALL_PATH. You can build Hivemall package by cd $HIVEMALL_PATH && ./bin/build.sh.

results matching ""

    No results matching ""