Kafka to SQL - easy & dirty, done

graph LR Kafka -->|events| SQL[SQL Database]

Kafka messaging system is commonly used for communication between services.

Often times, we need to get messages from Kafka into some materialized form, i.e. SQL database, for analytical purposes. In this article, we discuss a way to achieve this in a very simple and quick way, without the need to deploy new frameworks.

When your use-case:

  • handles low volume of data, lets say at most hundred messages per second
  • analytical SQL database is not required to have the data in near real-time manner, minute of a delay is not an issue

You may consider our approach as valid solution, avoiding coding and deploying:

  • Kafka Consumer/Kafka Connect
  • Beam/Flink/Kafka Streams/Spark Structured Streaming

saving a lot of effort maintaining these deployments.

Curious how? Continue reading.

Combining Docker Images - a way to compose multiple images into one

From time to time, we need to combine or compose mutliple tools into a single docker image. Instead of running a tedious apt-get/yum install and waiting for your image to build, try just combining the upstream docker images directly into one.

Docker Azcopy image

Although there is a request to provide official image and there are few images for AzCopy out there, either those are not updated or are very bare with a minimum toolset.

In this article we introduce, publish and maintain our michalklempa/azcopy-all image which includes az-cli, azcopy, kubectl all with bash completion.

Apache Spark on Kubernetes - Publishing Spark UIs on Kubernetes (Part 3)

This is the Publishing Spark UIs on Kubernetes (Part 3) from article series (see Part 1 and Part 2).

In this article we go through the process of publishing Spark Master, workers and driver UIs in our Kubernetes setup.

Apache Spark on Kubernetes - Submitting a job to Spark on Kubernetes (Part 2)

This is the Preparing Spark Docker image for submitting a job to Spark on Kubernetes (Part 2) from article series (see Part 1).

In this article we explain two options for job submitting to cluster:

  1. Create an inherited custom Docker image with job jar and submit capability.
  2. Mount a volume to original image with job jar.

Apache Spark on Kubernetes - Docker image for Spark Standalone cluster (Part 1)

In this series of articles we create Apache Spark on Kubernetes deployment. Spark will be running in standalone cluster mode, not using Spark Kubernetes support as we do not want any Spark submit to spin-up new pods for us.

This is the Docker image for Spark Standalone cluster (Part 1), where we create a custom Docker image with our Spark distribution and scripts to start-up Spark master and Spark workers.

Complete guide comprises of 3 parts:

  • Docker image for Spark Standalone cluster (Part 1)
  • Submitting a job to Spark on Kubernetes (Part 2)
  • Publishing Spark UIs on Kubernetes (Part 3)

Running Ansible from inside Docker image for CI/CD pipeline

In this article we prepare simple Docker image packed with our Ansible roles, which will be ready-made for provisioning just by running the container from this image.

In this article we describe process of encapsulating ansible executable, Ansible roles, dependent galaxy roles, SSH key material and group variables into a docker image for CI/CD use. We also present a way to run prepared image from command-line without installing Ansible.

tt: Simplest and fastest time tracking

When it comes to tracking time on activities precisely, one has to start coping with time tracking applications. I was not able to find a simple solution, all the options were UI-based, feature-full and annoying to use. Then I started searching for command-line options, where the situation is much more clean, but still, too many features.

Finally I came up with tt: a simple script on 6 lines of code. In this article I name some other options and inspirations you may use, and the script itself with installation steps.

Example:

tt hello
... some time
tt writing blog
... some time
tt lunch