Docker Azcopy image

Although there is a request to provide official image and there are few images for AzCopy out there, either those are not updated or are very bare with a minimum toolset.

In this article we introduce, publish and maintain our michalklempa/azcopy-all image which includes az-cli, azcopy, kubectl all with bash completion.

Apache Spark on Kubernetes - Publishing Spark UIs on Kubernetes (Part 3)

This is the Publishing Spark UIs on Kubernetes (Part 3) from article series (see Part 1 and Part 2).

In this article we go through the process of publishing Spark Master, workers and driver UIs in our Kubernetes setup.

Apache Spark on Kubernetes - Submitting a job to Spark on Kubernetes (Part 2)

This is the Preparing Spark Docker image for submitting a job to Spark on Kubernetes (Part 2) from article series (see Part 1).

In this article we explain two options for job submitting to cluster:

  1. Create an inherited custom Docker image with job jar and submit capability.
  2. Mount a volume to original image with job jar.

Apache Spark on Kubernetes - Docker image for Spark Standalone cluster (Part 1)

In this series of articles we create Apache Spark on Kubernetes deployment. Spark will be running in standalone cluster mode, not using Spark Kubernetes support as we do not want any Spark submit to spin-up new pods for us.

This is the Docker image for Spark Standalone cluster (Part 1), where we create a custom Docker image with our Spark distribution and scripts to start-up Spark master and Spark workers.

Complete guide comprises of 3 parts:

  • Docker image for Spark Standalone cluster (Part 1)
  • Submitting a job to Spark on Kubernetes (Part 2)
  • Publishing Spark UIs on Kubernetes (Part 3)

Running Ansible from inside Docker image for CI/CD pipeline

In this article we prepare simple Docker image packed with our Ansible roles, which will be ready-made for provisioning just by running the container from this image.

In this article we describe process of encapsulating ansible executable, Ansible roles, dependent galaxy roles, SSH key material and group variables into a docker image for CI/CD use. We also present a way to run prepared image from command-line without installing Ansible.

tt: Simplest and fastest time tracking

When it comes to tracking time on activities precisely, one has to start coping with time tracking applications. I was not able to find a simple solution, all the options were UI-based, feature-full and annoying to use. Then I started searching for command-line options, where the situation is much more clean, but still, too many features.

Finally I came up with tt: a simple script on 6 lines of code. In this article I name some other options and inspirations you may use, and the script itself with installation steps.

Example:

tt hello
... some time
tt writing blog
... some time
tt lunch 

Composing Avro Schemas from Subtypes

While working with Avro Schemas, one can quickly come to the point, where schema definitions for multiple entities start to overlap and schema files grow in number of lines. As with object oriented design of classes in your program, same principle could be applied to design of your Avro schema collection. Unfortunately Avro Schema Definitition language does not have a native require or import syntax.

One possible solution is to rewrite all the schemas into Avro Interface Definition language, which have the import feature (see [1]).

If you do not want to rewrite all the schemas or simply like the JSON schema definitions more, in this article we will introduce a mechanism how to:

  • design small schema file units, containing Avro named types
  • programatically compose the files into large Avro schemas, one file per one type

Article is accompanied with full example on usage and source code of the Avro Compose - automatic schema composition tool.

Move Flink Savepoint to a different S3 location

Users of Apache Flink are familiar with creating a savepoint and restarting a job from savepoint.

The issue with savepoint is, how to move a savepoint to a different location and be able to start a Flink job from the new location. Problem lies in the _metadata file of savepoint files, which contains absolute URIs (see documentation on moving savepoint).

In this article, we go step-by-step on how to move Flink savepoint from one S3 bucket to another and how to safely (without corrupting) alter the _metadata file in the destination, so that the Flink job starts smoothly from a new savepoint location. Setup is tested with S3 and filesystem state backend.