Deploy Hadoop cluster (Data Lake stack)

First published: Wednesday, May 28, 2025 | Last updated: Wednesday, May 28, 2025

Deploy Hadoop cluster (Data Lake stack) using the SloopStash Docker starter-kit.


Configure environment variables

Supported environment variables

# Allowed values for $ENVIRONMENT variable.
* dev
* qaa
* qab

Set environment variables

# Store environment variables.
$ export ENVIRONMENT=dev

Bootstrap Data Lake stack (Hadoop cluster) environment

Docker

[!WARNING]
The Linux machine must have at least 1.5 GB RAM to avoid JVM memory pressure while running this 4-node Hadoop cluster.

# Switch to Docker starter-kit directory.
$ cd /opt/kickstart-docker

# Provision OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 up -d

Hadoop

Verify Hadoop cluster

# Access Bash shell of existing OCI container running Hadoop name node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-name-1-1 /bin/bash

# List Hadoop data nodes.
$ hdfs dfsadmin -report

# Exit shell.
$ exit

Write data to HDFS filesystem

# Access Bash shell of existing OCI container running Hadoop data node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-1-1 /bin/bash

# Write data to HDFS filesystem.
$ hdfs dfs -mkdir -p /nginx/log/14-07-2024
$ touch access.log
$ echo "[14-07-2024 10:50:23] 14.1.1.1 app.crm.sloopstash.dv GET /dashboard HTTP/1.1 200 http://app.crm.sloopstash.dv/dashboard 950 - Mozilla Firefox - 0.034" > access.log
$ hdfs dfs -put -f access.log /nginx/log/14-07-2024

# Exit shell.
$ exit

Read data from HDFS filesystem

# Access Bash shell of existing OCI container running Hadoop data node 2.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-2-1 /bin/bash

# Read data from HDFS filesystem.
$ hdfs dfs -ls -R /
$ hdfs dfs -cat /nginx/log/14-07-2024/access.log

# Exit shell.
$ exit

Manage Data Lake stack (Hadoop cluster) environments

Docker

# Switch to Docker starter-kit directory.
$ cd /opt/kickstart-docker

# Stop OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 down

# Restart OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 restart