In this series of articles we create Apache Spark on Kubernetes deployment. Spark will be running in standalone cluster mode, not using Spark Kubernetes support as we do not want any Spark submit to spin-up new pods for us.
This is the Docker image for Spark Standalone cluster (Part 1), where we create a custom Docker image with our Spark distribution and scripts to start-up Spark master and Spark workers.
Complete guide comprises of 3 parts:
- Docker image for Spark Standalone cluster (Part 1)
- Submitting a job to Spark on Kubernetes (Part 2)
- Publishing Spark UIs on Kubernetes (Part 3)
Introduction
Although Apache Spark has support for running on Kubernetes, one may find himself neglecting this option for various reasons:
- Spark Submit creates pods by itself, which gives control of resources from hands of engineers to submitters
- Volumes, secrets and ony other details have to specified through spark submit, which is confusing for administrators, who are usually used to define whole deployment
- Distribution of JAR files is tedious, preparing a correct shaded packaging and/or prepending the classpath is a hell
On the other side, given the full control of what is included in a Docker image and Kubernetes manifest:
- we can avoid all of the JAR distribution between driver program and workers, everything will be hand picked in the image design (we can even avoid fat jar)
- we have full control of resources in deployment - how many workers/executors are deployed is controlled by deployment manifest
Architecture
We want to deploy Spark using Standalone cluster mode on a fixed defined Kubernetes deployment. We will design a single Docker image for this purpose, but we will start using different startup commands inside it, designed for: master, worker and driver program respectively. There will be a role of worker, master and driver programs running on Kubernetes as depicted on figure:
Deployment architecture
Docker image
We will use Docker multi stage build. In build stage we download the Spark binaries and then in final stage we cherry-pick only needed binaries to final image. We can also parametrize the build with args:
ARG SPARK_VERSION=3.0.1
ARG HADOOP_VERSION=2.7
ARG SCALA_VERSION=2.12
To download a Spark binary, we can re-use ready made curlimages/curl
image as a build phase in Dockerfile
:
FROM curlimages/curl as build
ARG SPARK_VERSION
ARG HADOOP_VERSION
ENV UPSTREAM_FILE_NAME="spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"
ENV LOCAL_FILE_NAME="/tmp/${UPSTREAM_FILE_NAME}"
RUN curl -# "$(curl -s https://www.apache.org/dyn/closer.cgi\?preferred\=true)spark/spark-${SPARK_VERSION}/${UPSTREAM_FILE_NAME}" --output "${LOCAL_FILE_NAME}"
ENV SPARK_TMP="/tmp/spark"
RUN mkdir ${SPARK_TMP}
RUN tar -xvzf "${LOCAL_FILE_NAME}" --strip-components 1 -C "${SPARK_TMP}"
RUN echo "${UPSTREAM_FILE_NAME}" > "${SPARK_TMP}/.spark-version"
Notice how we maintain precise control of file names on our local filesystem by enforcing the output filename /opt/spark/spark-3.0.1-bin-hadoop2.7.tgz
with --output
argument to curl
.
We unpack the tgz
into directory /tmp/spark
under which we create a without named version (that is --strip-components 1
argument to tar
.
When we download the distribution tgz
, we instruct curl
with --output
to force our filename /tmp/spark-3.0.1-bin-hadoop2.7.tgz
on local disk.
Now we can build the image with the build phase:
docker build -t michalklempa/spark-build-phase --target build .
Lets inspect what is inside:
> docker run -it --rm --entrypoint sh michalklempa/spark-build-phase
> pwd
/
> cd /tmp/spark
/tmp/spark > ls -alh
total 160K
total 164K
drwxr-xr-x 1 curl_use curl_gro 4.0K Dec 28 19:10 .
drwxrwxrwt 1 root root 4.0K Dec 28 19:09 ..
-rw-r--r-- 1 curl_use curl_gro 30 Dec 28 19:10 .spark-version
-rw-r--r-- 1 curl_use curl_gro 22.8K Aug 28 08:10 LICENSE
-rw-r--r-- 1 curl_use curl_gro 56.3K Aug 28 08:10 NOTICE
drwxr-xr-x 3 curl_use curl_gro 4.0K Dec 28 19:09 R
-rw-r--r-- 1 curl_use curl_gro 4.4K Aug 28 08:10 README.md
-rw-r--r-- 1 curl_use curl_gro 183 Aug 28 08:10 RELEASE
drwxr-xr-x 2 curl_use curl_gro 4.0K Dec 28 19:09 bin
drwxr-xr-x 2 curl_use curl_gro 4.0K Dec 28 19:10 conf
drwxr-xr-x 5 curl_use curl_gro 4.0K Dec 28 19:09 data
drwxr-xr-x 4 curl_use curl_gro 4.0K Dec 28 19:09 examples
drwxr-xr-x 2 curl_use curl_gro 16.0K Dec 28 19:10 jars
drwxr-xr-x 4 curl_use curl_gro 4.0K Dec 28 19:09 kubernetes
drwxr-xr-x 2 curl_use curl_gro 4.0K Dec 28 19:09 licenses
drwxr-xr-x 9 curl_use curl_gro 4.0K Dec 28 19:10 python
drwxr-xr-x 2 curl_use curl_gro 4.0K Dec 28 19:09 sbin
drwxr-xr-x 2 curl_use curl_gro 4.0K Dec 28 19:09 yarn
Note the listing and move to designing the final image phase:
FROM azul/zulu-openjdk-debian:11 as final
ARG SCALA_VERSION
ARG SPARK_VERSION
ARG SPARK_TMP="/tmp/spark"
ENV SPARK_HOME="/opt/spark"
RUN mkdir ${SPARK_HOME}
Now we can cherry-pick only needed Spark binaries using COPY
with --from
:
COPY --from=build ${SPARK_TMP}/bin ${SPARK_HOME}/bin
COPY --from=build ${SPARK_TMP}/conf ${SPARK_HOME}/conf
COPY --from=build ${SPARK_TMP}/jars ${SPARK_HOME}/jars
COPY --from=build ${SPARK_TMP}/LICENSE ${SPARK_HOME}/LICENSE
COPY --from=build ${SPARK_TMP}/NOTICE ${SPARK_HOME}/NOTICE
COPY --from=build ${SPARK_TMP}/README.md ${SPARK_HOME}/README.md
COPY --from=build ${SPARK_TMP}/RELEASE ${SPARK_HOME}/RELEASE
COPY --from=build ${SPARK_TMP}/sbin ${SPARK_HOME}/sbin
COPY --from=build ${SPARK_TMP}/.spark-version ${SPARK_HOME}/.spark-version
The .spark-version
file is there for a future maintainer who will look inside the image without prior knowledge.
Startup scripts
We design 3 scripts for running:
- Spark Master (
master.sh
) - Spark Worker (
worker.sh
) - Driver program (
submit.sh
)
Spark Master is the easiest one:
#!/bin/bash
env
export SPARK_PUBLIC_DNS=$(hostname -i)
java ${JAVA_OPTS} \
-cp "${SPARK_HOME}/conf:${SPARK_HOME}/jars/*" \
org.apache.spark.deploy.master.Master \
--host ${SPARK_PUBLIC_DNS} --port 7077 --webui-port 8080
Notice the SPARK_PUBLIC_DNS
variable. This is used by Spark internals to advertise other components the address on which the service is reachable.
Since we are in Kubernetes cluster environment, pod DNS is changing every time it is created.
We can use ClusterIP service to make the master reachable from workers, but if we would do the same with workers, each worker would need a separated service which is hard to maintain.
Instead of that, we will force Spark to use IP addresses as seen above.
Once the worker pod connects to Spark master using balanced ClusterIP service (with exactly one backing pod for master), the master responds with advertising its IP address for further communication.
Same will hold true for workers, executors and driver program.
Worker scripts are very similar:
#!/bin/bash
env
export SPARK_PUBLIC_DNS=$(hostname -i)
java ${JAVA_OPTS} \
-cp "${SPARK_HOME}/conf:${SPARK_HOME}/jars/*" \
org.apache.spark.deploy.worker.Worker \
--port 7078 --webui-port 8080 \
${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}
Driver script is a bit more interesting:
#!/bin/bash
env
export SPARK_PUBLIC_DNS=$(hostname -i)
java ${JAVA_OPTS} \
-cp "${JAR}:${SPARK_HOME}/conf:${SPARK_HOME}/jars/*" \
org.apache.spark.deploy.SparkSubmit \
--deploy-mode client \
--master spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT} \
--class ${MAIN_CLASS} \
${SUBMIT_OPTS} \
--conf 'spark.driver.host='$(hostname -i) \
local://${JAR}
Notice the variables SPARK_MASTER_HOST
and SPARK_MASTER_PORT
, those are to be set from outside (either in docker-compose.yml
or Kubernetes manifest.yml
).
There are handy placeholders for anyone willing to just drop some more options on command-line:
JAVA_OPTS
, for example to specify customlog4j.properties
fileSUBMIT_OPTS
, if custom options to spark submit are to be presented (e.g.--executor-cores
etc.)
Interesting is the JAR
variable. This should hold the location of our job jar, which should be present in all docker images used (since there is a local://
scheme used).
This file can be supplied to the running image via Kubernetes volume or build into your custom image.
The latter option is what we describe in next article in the series.
Build the Docker image:
docker build -t michalklempa/spark:3.0.1-hadoop2.7 .
Running with docker-compose
For now, lets test, we can get the Spark Standalone cluster up and running.
You will need Docker and Docker Compose.
To do that, we may use docker-compose.yml
.
> docker-compose up
spark-master_1 | /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
spark-master_1 | HOSTNAME=15bdc9c80df8
spark-master_1 | LANGUAGE=en_US:en
spark-master_1 | JAVA_HOME=/usr/lib/jvm/zulu11-ca-amd64
spark-master_1 | SPARK_MASTER_HOST=spark-master
spark-master_1 | PWD=/opt/spark
spark-master_1 | HOME=/root
spark-master_1 | LANG=en_US.UTF-8
spark-master_1 | SPARK_MASTER_PORT=7077
spark-master_1 | SHLVL=1
spark-master_1 | SPARK_HOME=/opt/spark
spark-master_1 | LC_ALL=en_US.UTF-8
spark-master_1 | PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/spark/bin
spark-master_1 | SPARK_SCALA_VERSION=2.12
spark-master_1 | _=/usr/bin/env
spark-master_1 | Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
spark-master_1 | 20/12/29 19:54:41 INFO Master: Started daemon with process name: 9@15bdc9c80df8
...
Spark master should listen on localhost:8080
as seen on next screenshot.
Spark master UI
Running with minikube
To test the setup on Kubernetes, install minikube.
There is a manifest.yml
prepared to test the setup.
Just start your minikube cluster:
minikube start
The image pull policy is set to Never, so we should re-build the image in minikube environment:
eval $(minikube docker-env)
docker build -t michalklempa/spark:3.0.1-hadoop2.7 .
And apply the manifest:
kubectl apply -f manifest.yml
The pod list should look like:
> kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-master-deployment-5694fcd6fb-nkqd9 1/1 Running 1 22h
spark-worker-deployment-57b94d48f-b9cm6 1/1 Running 1 22h
spark-worker-deployment-57b94d48f-xbhbf 1/1 Running 1 22h
The driver pod fails, since there is no Jar file present. We can test the setup by getting the published service URL from minikube:
> minikube service list
|-------------|---------------------|--------------|-------------------------|
| NAMESPACE | NAME | TARGET PORT | URL |
|-------------|---------------------|--------------|-------------------------|
| default | kubernetes | No node port |
| default | spark-master | No node port |
| default | spark-master-expose | http/8080 | http://172.17.0.2:32473 |
| kube-system | kube-dns | No node port |
|-------------|---------------------|--------------|-------------------------|
And pointing our browser to http://172.17.0.2:32473.
Conclusion
All the project files and roles are available in docker-spark github repository. Pre-built docker images available at hub.docker.com/r/michalklempa/spark.