Apache Spark on Kubernetes - Publishing Spark UIs on Kubernetes (Part 3)

This is the Publishing Spark UIs on Kubernetes (Part 3) from article series (see Part 1 and Part 2).

In this article we go through the process of publishing Spark Master, workers and driver UIs in our Kubernetes setup.

Publishing Master UI

To publish Spark Master UI, we need to define a LoadBalancer type of service in Kubernetes manifest:

apiVersion: v1
kind: Service
metadata:
  name: spark-master-expose
  labels:
    app: spark
    role: master
spec:
  selector:
    app: spark
    role: master
  type: LoadBalancer
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
      name: http

Since the name spark-master for service is already taken by our ClusterIP service, we had chosen a different name with -expose suffix.

After applying this change and refreshing the services, we should see an exposed port on our minikube cluster:

> minikube service list
|-------------|---------------------|--------------|-------------------------|
|  NAMESPACE  |        NAME         | TARGET PORT  |           URL           |
|-------------|---------------------|--------------|-------------------------|
| default     | kubernetes          | No node port |
| default     | spark-master        | No node port |
| default     | spark-master-expose | http/8080    | http://172.17.0.2:32473 |
| kube-system | kube-dns            | No node port |
|-------------|---------------------|--------------|-------------------------|

Exposing Workers and Driver

Exposing workers might be a little bit tricky. The number of them can change, some of them mail fail from time to time. Creating a separate Kubernetes service for each of them would be tedious (although with Helm, probably possible). Fortunately, there is a feature called spark.ui.reverseProxy. This has to be set on master, every worker and executor and also driver. When set, master UI becomes a HTTP proxy to other components. To put this setting into every component, we create conf/spark-defaults.conf with one line:

spark.ui.reverseProxy               true

This file is loaded by every Spark component, thanks to specifying the classpath explicitly in startup scripts, for example worker.sh:

#!/bin/bash
env

export SPARK_PUBLIC_DNS=$(hostname -i)

java ${JAVA_OPTS} \
    -cp "${SPARK_HOME}/conf:${SPARK_HOME}/jars/*" \ # <<< see the classpath
    org.apache.spark.deploy.worker.Worker \
    --port 7078 --webui-port 8080 \
    ${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}

You can use this file to put any Spark configuration needed by your setup.

After building and applying the manifest:

kubectl apply -f manifest.yml

We inspect published service URL from minikube:

> minikube service list
|-------------|---------------------|--------------|-------------------------|
|  NAMESPACE  |        NAME         | TARGET PORT  |           URL           |
|-------------|---------------------|--------------|-------------------------|
| default     | kubernetes          | No node port |
| default     | spark-master        | No node port |
| default     | spark-master-expose | http/8080    | http://172.17.0.2:32473 |
| kube-system | kube-dns            | No node port |
|-------------|---------------------|--------------|-------------------------|

And browse to http://172.17.0.2:32473.

Conclusion

All the project files and roles are available in docker-spark github repository. Pre-built docker images available at hub.docker.com/r/michalklempa/spark.