Deploy Hadoop cluster (Data Lake stack)

First published: Wednesday, May 28, 2025 | Last updated: Wednesday, May 28, 2025

Deploy Hadoop cluster (Data Lake stack) using the SloopStash Docker starter-kit.


Previous: Deploy Ollama-based GenAI LLMs (AI stack)

Next: Deploy Kafka cluster (Data Lake stack)

Deploy and manage Data Lake stack (Hadoop cluster) environments

Docker

The Linux machine must have at least 1.5 GB RAM to avoid JVM memory pressure while running this 4-node Hadoop cluster.

# Switch to SloopStash Docker starter-kit directory.
$ cd /opt/kickstart-docker

# Provision OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 up -d

# Stop OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 down

# Restart OCI containers using Docker compose.
$ sudo docker compose -f compose/data-lake/hadoop/main.yml --env-file compose/${ENVIRONMENT^^}.env -p sloopstash-${ENVIRONMENT}-data-lake-s1 restart

Hadoop

Verify Hadoop cluster

# Access Bash shell of existing OCI container running Hadoop name node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-name-1-1 /bin/bash

# List Hadoop data nodes.
$ hdfs dfsadmin -report

# Exit shell.
$ exit

Write data to HDFS filesystem

# Access Bash shell of existing OCI container running Hadoop data node 1.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-1-1 /bin/bash

# Write data to HDFS filesystem.
$ hdfs dfs -mkdir -p /nginx/log/14-07-2024
$ touch access.log
$ echo "[14-07-2024 10:50:23] 14.1.1.1 app.crm.sloopstash.dv GET /dashboard HTTP/1.1 200 http://app.crm.sloopstash.dv/dashboard 950 - Mozilla Firefox - 0.034" > access.log
$ hdfs dfs -put -f access.log /nginx/log/14-07-2024

# Exit shell.
$ exit

Read data from HDFS filesystem

# Access Bash shell of existing OCI container running Hadoop data node 2.
$ sudo docker container exec -ti sloopstash-${ENVIRONMENT}-data-lake-s1-hadoop-data-2-1 /bin/bash

# Read data from HDFS filesystem.
$ hdfs dfs -ls -R /
$ hdfs dfs -cat /nginx/log/14-07-2024/access.log

# Exit shell.
$ exit

Previous: Deploy Ollama-based GenAI LLMs (AI stack)

Next: Deploy Kafka cluster (Data Lake stack)