Deploy Hadoop cluster (Data Lake stack)
First published: Wednesday, May 28, 2025 | Last updated: Wednesday, May 28, 2025Deploy Hadoop cluster (Data Lake stack) using the SloopStash Docker starter-kit.
Configure environment variables
Supported environment variables
# Allowed values for $ENVIRONMENT variable.
* dev
* qaa
* qab
Set environment variables
# Store environment variables.
$ export ENVIRONMENT=dev
Bootstrap Data Lake stack (Hadoop cluster) environment
Docker
[!WARNING]
The Linux machine must have at least 1.5 GB RAM to avoid JVM memory pressure while running this 4-node Hadoop cluster.
# Switch to Docker starter-kit directory.
$ cd /opt/kickstart-docker
# Provision OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 up -d
Hadoop
Verify Hadoop cluster
# Access Bash shell of existing OCI container running Hadoop name node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-name-1-1 /bin/bash
# List Hadoop data nodes.
$ hdfs dfsadmin -report
# Exit shell.
$ exit
Write data to HDFS filesystem
# Access Bash shell of existing OCI container running Hadoop data node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-1-1 /bin/bash
# Write data to HDFS filesystem.
$ hdfs dfs -mkdir -p /nginx/log/14-07-2024
$ touch access.log
$ echo "[14-07-2024 10:50:23] 14.1.1.1 app.crm.sloopstash.dv GET /dashboard HTTP/1.1 200 http://app.crm.sloopstash.dv/dashboard 950 - Mozilla Firefox - 0.034" > access.log
$ hdfs dfs -put -f access.log /nginx/log/14-07-2024
# Exit shell.
$ exit
Read data from HDFS filesystem
# Access Bash shell of existing OCI container running Hadoop data node 2.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-2-1 /bin/bash
# Read data from HDFS filesystem.
$ hdfs dfs -ls -R /
$ hdfs dfs -cat /nginx/log/14-07-2024/access.log
# Exit shell.
$ exit
Manage Data Lake stack (Hadoop cluster) environments
Docker
# Switch to Docker starter-kit directory.
$ cd /opt/kickstart-docker
# Stop OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 down
# Restart OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 restart