Introduction

Recently I had work to produce a document with a comparison between two tools for Cloud Data Flow. I didn’t have any knowledge of this kind of technology before creating this document. Apache NiFi is one of the tools in my comparison document. So, here I describe some of my procedures to learn about it and take my own preliminary conclusions. I followed many steps on my own desktop (a MacBook Pro computer) to accomplish this task. This document shows you what I did.

Basically, to learn about Apache NiFi in order to do a comparison with other tool:

About this document

This document was written using Vim (my favorite text editor) and its source code is in AsciiDoc format. The generated output formats (HTML and PDF) are build (and published in GitHub Pages) with Gradle. Read more about the generation processes of this document in README.adoc.

About me

You can read more about me on my cv.

Videos with a technical background

Lab 1: Running Apache NiFi inside a Docker container

For me, the best way to start learning a new technology is by running all the stuff related to them inside a Docker container. By this way, I can abstract myself about the related installation procedures and go directly to the point.

So, In this tutorial, I present the steps to work with Apache NiFi using Docker.

Prerequisites

  1. Docker installed.

Start/Restart

First start:

docker run --name nifi -p 9090:9090 -d -e NIFI_WEB_HTTP_PORT='9090' apache/nifi:latest

Restart (if was started any time before with the command below and stopped):

docker start nifi

Access to the UI

Status

$ docker ps
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                                                   NAMES
3a506cfec5ab        apache/nifi:latest   "/bin/sh -c ${NIFI_B…"   10 hours ago        Up 10 hours         8080/tcp, 8443/tcp, 10000/tcp, 0.0.0.0:9090->9090/tcp   nifi

Stop

docker stop nifi
$ docker ps -a
CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS                        PORTS               NAMES
3a506cfec5ab        apache/nifi:latest   "/bin/sh -c ${NIFI_B…"   10 hours ago        Exited (137) 33 seconds ago                       nifi

Lab 2: Running Apache NiFi locally

Prerequisites

  1. Java installed.

Installation

$ brew install nifi
....
######################################################################## 100.0%
🍺  /usr/local/Cellar/nifi/1.6.0: 386 files, 1.2GB, built in 45 minutes 44 seconds

Start

$ nifi
Usage nifi {start|stop|run|restart|status|dump|install}
$ nifi start

Java home: /Users/pj/.sdkman/candidates/java/current
NiFi home: /usr/local/Cellar/nifi/1.6.0/libexec

Bootstrap Config File: /usr/local/Cellar/nifi/1.6.0/libexec/conf/bootstrap.conf

Access to the UI

Status

$ nifi status

Java home: /Users/pj/.sdkman/candidates/java/current
NiFi home: /usr/local/Cellar/nifi/1.6.0/libexec

Bootstrap Config File: /usr/local/Cellar/nifi/1.6.0/libexec/conf/bootstrap.conf

2018-04-29 08:02:13,153 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 58129, PID=4024

Stop

$ nifi stop

Java home: /Users/pj/.sdkman/candidates/java/current
NiFi home: /usr/local/Cellar/nifi/1.6.0/libexec

Bootstrap Config File: /usr/local/Cellar/nifi/1.6.0/libexec/conf/bootstrap.conf

2018-04-29 08:11:41,562 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi has accepted the Shutdown Command and is shutting down now
2018-04-29 08:11:41,587 INFO [main] org.apache.nifi.bootstrap.Command Waiting for Apache NiFi to finish shutting down...
2018-04-29 08:11:43,597 INFO [main] org.apache.nifi.bootstrap.Command NiFi has finished shutting down.

Lab 3: Building a simple Data Flow

Prerequisites

  1. Docker installed.

Step 1 - Create a Nifi docker container with default parameters

$ docker run --name nifi -p 8080:8080 -d apache/nifi

Step 2 - Access the UI and create two processors

Step 3 - Add and configure processor 1 (GenerateFlowFile)

Drag and drop a processor into canvas:

00

Search for a processor named GenerateFlowFile:

01

Click on Add and the processor will be added to the canvas:

02

Configure the processor (2 steps):

Step 1 - Adjust the Run Schedule to 5 sec:

03

Step 2 - Adjust the propertie File Size to 2KB:

04

Step 4 - Add and configure processor 2 (Putfile)

Drag another processor into canvas. Search for PutFile:

05

Add it to the canvas:

06

Configure the Directory property to /tmp/nifi.

07

Configure Automatically Terminate RelationShips by checking the boxes failure and success.

08

Step 5 - Connect the processors

From GenerateFile to Putfile:

09

A connection will be create:

10

This will be the final state:

11

Step 6 - Start the processors

Click Ctrl to select both processors and start it.

12

Step 7 - View the generated logs

Open a shell inside the container:

$ docker exec -it nifi /bin/bash

Type the following command to see a list of the 9 generated files. This list will be actualized second by second. As we configure in NiFi, a new file will be generated on every 5 seconds.

$ while :; do clear; ls -lht /tmp/nifi/ | head -10; sleep 1; done

Step 8 - Stop the processors

Click Ctrl to select both processors and stop it.

13

Step 9 - Stop and destroy the docker container

$ docker stop nifi
$ docker rm nifi

Conclusions

  • NiFi UI is very simple and intuitive.

  • The properties are well documented.

  • Many other aspects of the UI can be explored in this playlist.

All references

Apache NiFi
Other
YouTube videos
GitHub
Community Forums/Meetups
Stack Overflow
Books
Articles/ examples