Agility, Java programming, New technologies and more…
  • rss
  • Home
  • Management
  • Agile Programming
  • Technology
  • Linux
  • Event
  • Android app
  • Contact
  • About the author
  • English
  • Francais

Flagger – Monitor your Canary deployments with Grafana

Fabian Piau | Sunday June 28th, 2020 - 06:07 PM
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Pocket

 Version française disponible

Update
October, 17th, 2020 : Use newer versions (Helm 3, Kube 18, Istio 1.7, Flagger 1.2).

Flagger logo

This is the third article in our series dedicated to Flagger. In a nutshell, Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.

Make sure you have a local Kubernetes cluster running with the service mesh Istio. If you don’t, read the first article: Flagger – Get Started with Istio and Kubernetes. You also need to be familiar with Flagger and MHS, you will find all details in the second article: Flagger – Canary deployments on Kubernetes.

In this third guide, we will focus on the installation of Grafana for Flagger and how you can monitor your Canary deployments without having to use Kubernetes (i.e. Kube dashboard or kubectl command line tool).

Note
This is a hands-on guide and can be followed step by step on MacOS. It will require some adjustments if you are using a Windows or Linux PC. It is important to note that this article will not go into details and only grasp the concepts & technologies so if you are not familiar with Docker, Kubernetes, Helm or Istio, I strongly advise you to check some documentation yourself before continuing reading.


Installing Grafana

Flagger provides a Grafana dashboard out of the box to monitor all the canary deployments inside your cluster.

Let’s install Grafana in the namespace istio-system with a single command:

helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus:9090 \
--set user=admin \
--set password=changeme

Reference: Flagger monitoring
Flagger depends on Istio telemetry and Prometheus (I’ve assumed Istio is installed in the istio-system namespace).

After a few seconds, you should get a message confirming that Grafana for Flagger has been installed. From the Kubernetes dashboard, verify that the Flagger Grafana pod is running in istio-system.

Grafana for Flagger is deployed in your cluster

Grafana for Flagger is deployed in your cluster

To expose Grafana, run:

kubectl -n istio-system port-forward svc/flagger-grafana 3000:80

So you can access it with your browser at http://localhost:3000/d/flagger-istio/istio-canary. Use the login and password you specified before (admin / changeme).

Note that as we use Istio, we use the Istio Canary dashboard. Flagger is compatible with other service meshes and there are other dashboards available.


Running the canary deployments

If you have followed the previous article, make sure you select application for the Namespace and associate mhs-primary to Primary and mhs to Canary.

Then try again the different experiments we did in the previous article and monitor the Grafana dashboard at the same time, especially:

  • Experiment 1 – MHS v1.1.2 successful deployment
  • Experiment 2 – MHS v1.1.3 faulty deployment


Results

Look at this screenshot I took during Experiment 1 when the canary release was successful and the new version rolled out. I annotated it and added explanations, so it’s easier to understand the graphs.

A successful Canary deployment

A successful Canary deployment

And this is a similar screenshot for Experiment 2 when the canary release did not succeed and the new version was not rolled out.

A failed Canary deployment

A failed Canary deployment

Congratulations, you’ve come to the end of this third tutorial!


Cleaning up resources

You can delete the MHS application and its namespace. We can also remove Istio and Flagger because this is the last article of the series.

helm delete mhs --namespace application
kubectl delete namespaces application
helm delete flagger --namespace flagger-system
kubectl delete namespaces flagger-system
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.7/samples/addons/prometheus.yaml
istioctl manifest generate --set profile=demo | kubectl delete -f -
kubectl delete namespaces istio-system

You can also stop the Kubernetes cluster by unchecking the box and restarting Docker Desktop.

Related posts

kubernetesFlagger – Canary deployments on Kubernetes kubernetesFlagger – Get Started with Istio and Kubernetes
Comments
No Comments »
Categories
Agile programming
Tags
cloud, docker, flagger, grafana, helm, istio, kubernetes
Comments rss Comments rss

Flagger – Canary deployments on Kubernetes

Fabian Piau | Tuesday May 19th, 2020 - 07:56 PM
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Pocket

 Version française disponible

Update
October, 17th, 2020 : Use newer versions (Helm 3, Kube 18, Istio 1.7, Flagger 1.2).

Flagger logo

This article is the second one of the series dedicated to Flagger. In a nutshell, Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.

Make sure you have a local Kubernetes cluster running with the service mesh Istio. If you don’t, read the first article: Flagger – Get Started with Istio and Kubernetes.

In this second guide, we will focus on the installation of Flagger and run multiple canary deployments of the application Mirror HTTP Server (MHS). Remember that this dummy application can simulate valid and invalid responses based on the request. This is exactly what we need to test the capabilities of Flagger. We will cover both happy (rollout) and unhappy (rollback) scenarios.

Note
This is a hands-on guide and can be followed step by step on MacOS. It will require some adjustments if you are using a Windows or Linux PC. It is important to note that this article will not go into details and only grasp the concepts & technologies so if you are not familiar with Docker, Kubernetes, Helm or Istio, I strongly advise you to check some documentation yourself before continuing reading.


Installing Flagger

Let’s install Flagger by running these commands.

kubectl create ns flagger-system

We install Flagger in its own namespace flagger-system.

helm repo add flagger https://flagger.app

kubectl apply -f https://raw.githubusercontent.com/weaveworks/flagger/master/artifacts/flagger/crd.yaml

helm upgrade -i flagger flagger/flagger \
--namespace=flagger-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://prometheus.istio-system:9090

Reference: Flagger Install on Kubernetes
Flagger depends on Istio telemetry and Prometheus (in that case, we assume Istio is installed in the istio-system namespace).
All parameters are available on the Flagger readme file on GitHub.
We don’t specify a version for Flagger, which means it will use the latest available in the repo (1.2.0 at the time of writing).

After a few seconds, you should get a message confirming that Flagger has been installed. From the Kube dashboard, verify that a new namespace has been created flagger-system and the Flagger pod is running.

Flagger is deployed in your cluster

Flagger is deployed in your cluster


Experiment 0 – Initialize Flagger with MHS v1.1.1

Mirror HTTP Server has multiple versions available. To play with Flagger canary deployment feature, we will switch between version 1.1.1, 1.1.2 and 1.1.3 of MHS (the latest version at the time of writing).

Before deploying MHS, let’s create a new namespace application, we don’t want to use the default one at the root of the cluster (this is good practice). The name is too generic, but sufficient for this tutorial, in general you will use the name of the team or the name of a group of features.

kubectl create ns application

Do not forget to activate Istio on this new namespace:

kubectl label namespace application istio-injection=enabled

To deploy MHS via Flagger, I created a Helm chart.

This “canary flavored” chart was created based on the previous chart without Flagger which itself was created with the helm create mhs-chart command, then adapted. In this “canary flavored” chart, I did some extra adaptation to use 2 replicas instead of 1 to make it more realistic and use a fixed version to 1.1.1, I also added the canary resource where the magic happens.

Clone the chart repo:

git clone https://github.com/fabianpiau/mhs-canary-chart.git

And install MHS:

cd mhs-canary-chart
helm install --name mhs --namespace application ./mhs

After a few moments, if you look at the dashboard, you should see 2 replicas of MHS in the namespace application.

MHS 1.1.1 is deployed in your cluster

MHS 1.1.1 is deployed in your cluster

It is important to note that no canary analysis has been performed and the version has been automatically promoted. It was not a “real” canary release.
Why? Because Flagger needs to initialize itself the first time we do a canary deployment of the application. So make sure the version you are deploying with Flagger the first time is fully tested and works well!
You could also guess this auto-promotion happened because there was no initial version of the application in the cluster. Although this is obviously a good reason, it’s important to note that, even if we had a previous version before (e.g. 1.1.0), the canary version 1.1.1 would have still been automatically promoted without analysis.

You can still check the canary events with:

kubectl -n application describe canary/mhs

You should have a similar output without a canary analysis:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 2m29s flagger mhs-primary.application not ready: waiting for rollout to finish: observed deployment generation less then desired generation
Normal Synced 92s (x2 over 2m30s) flagger all the metrics providers are available!
Normal Synced 92s flagger Initialization done! mhs.application

Or you can also directly check the log from Flagger:

export FLAGGER_POD_NAME=$(kubectl get pods --namespace flagger-system -l "app.kubernetes.io/name=flagger,app.kubernetes.io/instance=flagger" -o jsonpath="{.items[0].metadata.name}")

kubectl -n flagger-system logs $FLAGGER_POD_NAME

If you take a closer look at the Kube dashboard, you should see some mhs and mhs-primary resources:

  • mhs-primary are the primary instances (= the non-canary ones). Flagger automatically add the -primary suffix to differentiate them from the canary instances.
  • mhs are the canary instances. They exist only during the canary deployment and will disappear once the canary deployment ends. That’s why, in the screenshot above, you don’t see any mhs canary pods (i.e. 0 / 0 pod).

Why this naming convention? I asked Flagger team directly and there is a technical constraint.

Flagger is now initialized properly and MHS is deployed to your cluster. You can use the terminal to confirm MHS is accessible (thanks to the Istio Gateway):

curl -I -H Host:mhs.example.com 'http://localhost'

You should receive an HTTP 200 OK response:

HTTP/1.1 200 OK
x-powered-by: Express
date: Sun, 17 May 2020 16:47:33 GMT
x-envoy-upstream-service-time: 10
server: istio-envoy
transfer-encoding: chunked

And:

curl -I -H Host:mhs.example.com -H X-Mirror-Code:500 'http://localhost'

should return an HTTP 500 response:

HTTP/1.1 500 Internal Server Error
x-powered-by: Express
date: Sun, 17 May 2020 16:48:09 GMT
x-envoy-upstream-service-time: 12
server: istio-envoy
transfer-encoding: chunked


Experiment 1 – MHS v1.1.2 canary deployment

We are going to install a newer version 1.1.2. You need to manually edit the file mhs-canary-chart/mhs/values.yaml and replace tag: 1.1.1 with tag: 1.1.2 (this line).

Then:

cd mhs-canary-chart
helm upgrade mhs --namespace application ./mhs

While the canary deployment is in progress, it’s very important to generate some traffic to MHS. Without traffic, Flagger will consider that something went wrong with the new version and will rollback automatically to the previous one. Obviously, you don’t need this extra step in a production environment that continuously receives real traffic.

Run this loop command in another terminal to generate artificial traffic:

while (true); do curl -I -H Host:mhs.example.com 'http://localhost' ; sleep 0.5 ; done

Check the Kube dashboard, you should see the canary pod with the new version 1.1.2 at some point:

Canary deployment of MHS 1.1.2 in progress in your cluster

Canary deployment of MHS 1.1.2 in progress in your cluster

Check the canary events with the same command as before:

kubectl -n application describe canary/mhs

After a while (about 6 minutes) you should have a similar event output:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 30m flagger mhs-primary.application not ready: waiting for rollout to finish: observed deployment generation less then desired generation
Normal Synced 29m (x2 over 30m) flagger all the metrics providers are available!
Normal Synced 29m flagger Initialization done! mhs.application
Normal Synced 10m flagger New revision detected! Scaling up mhs.application
Normal Synced 9m16s flagger Starting canary analysis for mhs.application
Normal Synced 9m16s flagger Advance mhs.application canary weight 10
Normal Synced 8m16s flagger Advance mhs.application canary weight 20
Normal Synced 7m16s flagger Advance mhs.application canary weight 30
Normal Synced 6m16s flagger Advance mhs.application canary weight 40
Normal Synced 5m16s flagger Advance mhs.application canary weight 50
Normal Synced 4m16s flagger Copying mhs.application template spec to mhs-primary.application
Normal Synced 3m16s flagger Routing all traffic to primary
Normal Synced 2m16s flagger (combined from similar events): Promotion completed! Scaling down mhs.application

The canary release performed successfully. Now you have version 1.1.2 installed on all the primary pods and the canary pod has been removed.

MHS 1.1.2 is deployed in your cluster

MHS 1.1.2 is deployed in your cluster

Why did this deployment take about 6 minutes? Because it includes a 5 minutes canary analysis. During this analysis, the traffic was routed progressively to the canary pod. The canary traffic increased by steps of 10% every 1 minute until it reached 50% of the global traffic. The analysis is configurable and defined in the canary.yaml file that was added to the chart.

Below is the analysis configuration:

  analysis:
    # stepper schedule interval
    interval: 1m
    # max traffic percentage routed to canary - percentage (0-100)
    maxWeight: 50
    # canary increment step - percentage (0-100)
    stepWeight: 10
    # max number of failed metric checks before rollback (global to all metrics)
    threshold: 5
    metrics:
      - name: request-success-rate
        # percentage before the request success rate metric is considered as failed (0-100)
        thresholdRange:
          min: 99
        # interval for the request success rate metric check
        interval: 30s
      - name: request-duration
        # maximum req duration P99 in milliseconds before the request duration metric is considered as failed
        thresholdRange:
          max: 500
        # interval for the request duration metric check
        interval: 30s

The canary analysis has been covered with the 2 basic metrics that are provided out of the box by Istio / Prometheus (request success rate + duration). It is possible to define your own custom metrics. In that case, they will need to be provided by your application. Your application will need to expose a Prometheus endpoint that includes your custom metrics. And you will be able to update the Flagger analysis configuration to use them with your own PromQL query. Note this goes beyond the scope of this hands-on guide that uses only the built-in metrics.


Experiment 2 – MHS v1.1.3 faulty deployment

Again, you need to manually edit the file mhs-canary-chart/mhs/values.yaml and replace tag: 1.1.2 with tag: 1.1.3.

Then:

cd mhs-canary-chart
helm upgrade mhs --namespace application ./mhs

We generate some artificial traffic:

while (true); do curl -I -H Host:mhs.example.com 'http://localhost' ; curl -I -H Host:mhs.example.com -H X-Mirror-Code:500 'http://localhost' ; sleep 0.5 ; done

This time, we also generate invalid traffic to make sure the request success rate is going down!

Check the canary events with the same command as before:

kubectl -n application describe canary/mhs

After a while (about 6 minutes) you should have a similar event output:

Normal Synced 8m23s (x2 over 20m) flagger New revision detected! Scaling up mhs.application
Normal Synced 7m23s (x2 over 19m) flagger Advance mhs.application canary weight 10
Normal Synced 7m23s (x2 over 19m) flagger Starting canary analysis for mhs.application
Warning Synced 6m23s flagger Halt mhs.application advancement success rate 57.14% < 99%
Warning Synced 5m24s flagger Halt mhs.application advancement success rate 0.00% < 99%
Warning Synced 3m24s flagger Halt mhs.application advancement success rate 71.43% < 99%
Warning Synced 2m24s flagger Halt mhs.application advancement success rate 50.00% < 99%
Warning Synced 84s flagger Halt mhs.application advancement success rate 63.64% < 99%
Warning Synced 24s flagger Rolling back mhs.application failed checks threshold reached 5
Warning Synced 24s flagger Canary failed! Scaling down mhs.application

And you are still on version 1.1.2.

Flagger decided not to go ahead and propagate version 1.1.3 as it could not perform a successful analysis and the error threshold was reached, i.e. 5 times (indeed, each time, about 50% of the requests were ending up in an HTTP 500 response). Flagger has simply redirected all traffic back to the primary instances and removed the canary pod.

Congratulations, you’ve come to the end of this second tutorial!


Observations

Before we clean up the resources we’ve created, let’s wrap up with a list of observations:

  • Deleting a deployment will delete all pods (canary / primary). And we don’t end up with orphan resources.
  • Prometheus is required. Without it, the canary analysis won’t work.
  • It is not possible to re-trigger a canary deployment of the same version if it has just failed. It forces you to bump up the version (even if it was a configuration and not a code issue).
  • Flagger off-boarding process is not as simple as removing the canary resource from the chart and deploy a new version. If you delete the canary resource then Flagger won’t trigger the canary process, it will change the version in mhs and remove mhs-primary but mhs has 0 pods so it will make your service unavailable! You need to be careful and adopt a proper manual off-boarding process. Recently, the Flagger team added a property revertOnDeletion you can enable to avoid this issue. You can read the documentation to know more about this canary finalizer.
  • After multiple deployments, it seems that some events can be missing, the Kubernetes describe command is accumulating them (x<int> over <int>m) sometimes the order is not preserved and/or some events are not showing up. You can look at the phase status (terminal status are Initialized, Succeeded and Failed). The best is to look directly at the logs on the Flagger pod as this is always accurate and complete.
  • The canary analysis should be configured to run for a short period of time (i.e. no more than 30 minutes) to leverage continuous deployment and avoid releasing a new version while a canary deployment for the previous one is still in progress. If you want to perform canary releases over longer periods, Flagger may not be the best tool.
  • Finally, it’s important to remember that the first time you deploy with Flagger (like in experiment 0 above), the tool needs to initialize itself (Initialized status) and will not perform any analysis.


Cleaning up resources

Now the tutorial is complete you can remove the MHS application and its namespace.

helm delete mhs --namespace application

kubectl delete namespaces application

We recommend that you leave Flagger and Istio in place to save time in the next tutorial. If however you’d like to remove everything now, then you can run the following commands.

Remove Flagger:

helm delete flagger --namespace flagger-system

kubectl delete namespaces flagger-system

Remove Istio and Prometheus:

kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.7/samples/addons/prometheus.yaml

istioctl manifest generate --set profile=demo | kubectl delete -f -

kubectl delete namespaces istio-system


What’s next?

The next article will focus on the Grafana dashboard provided out of the box with Flagger which is a nice addition, so you don’t need to manually run any kuberctl commands to check the result of your canary deployments. Stay tuned! In the meantime, you can stop the Kubernetes cluster by unchecking the box and restarting Docker Desktop. Your computer deserves another break.

Related posts

kubernetesFlagger – Get Started with Istio and Kubernetes kubernetesFlagger – Monitor your Canary deployments with Grafana
Comments
No Comments »
Categories
Agile programming
Tags
cloud, docker, flagger, helm, istio, kubernetes
Comments rss Comments rss
Page 1 of 41234
Download CarmaBlog App

RSS feeds

  • RSS feed RSS - Posts
  • RSS feed RSS - Comments

Most viewed posts

  • Changing the language in Firefox - 115,578 views
  • Using Google Forms / Drive / Docs to create an online survey - 63,166 views
  • FAQ – Online survey with Google Forms / Drive / Docs - 52,403 views
  • Customizing Gnome 3 (Shell) - 30,017 views
  • The meaning of URL, URI, URN - 17,251 views
  • Java EE & CDI vs. Spring - 15,442 views
  • Open Street Map, better map than Google Maps? - 14,648 views
  • Comparing NoSQL: Couchbase & MongoDB - 14,082 views
  • Firefox Nightly, Aurora, Beta, Desktop, Mobile, ESR & Co. - 13,087 views
  • API, REST, JSON, XML, HTTP, URI… What language do you speak? - 12,718 views

Recent Comments

  • Pauline on FAQ – Online survey with Google Forms / Drive / DocsMerci Fabian, mais le but étant que nos clients pu…
  • Fabian Piau on FAQ – Online survey with Google Forms / Drive / DocsProbablement mais ces options sont en général paya…
  • Pauline on FAQ – Online survey with Google Forms / Drive / DocsBonjour Fabian, Merci de votre retour, oui j'avais…
  • Fabian Piau on FAQ – Online survey with Google Forms / Drive / DocsBonjour Pauline, ce n'est pas possible de créer un…
  • Pauline on FAQ – Online survey with Google Forms / Drive / DocsBonjour, Je suis en train de créer un Google Forms…

Recent posts

  • How to write a blog post? At least my way! - 3 months and 2 weeks ago
  • Bot Attacks: You are not alone… - 1 year and 11 months ago
  • Flagger – Monitor your Canary deployments with Grafana - 2 years and 8 months ago
  • Flagger – Canary deployments on Kubernetes - 2 years and 10 months ago
  • Flagger – Get Started with Istio and Kubernetes - 2 years and 10 months ago
  • Expedia CoderDojo in London - 3 years and 7 months ago
  • Volunteering at Devoxx4Kids - 3 years and 10 months ago
  • A Java 11 migration successful story - 4 years and 2 months ago
  • Tips to make your WordPress website secure - 4 years and 5 months ago
  • Devoxx UK 2018 – Day 2 - 4 years and 9 months ago
  • Devoxx UK 2018 – Day 1 - 4 years and 9 months ago
  • Wise, Revolut and Monzo, a small revolution for travelers and expats - 5 years and 1 month ago
  • Autocomplete for Git - 5 years and 10 months ago
  • Swagger, the automated API documentation - 6 years and 2 weeks ago
  • Microservices architecture – Best practices - 6 years and 5 months ago
Buy me a coffee

Language

  • Français
  • English

Follow me!

Follow me on Linkedin
Follow me on Twitter
Follow me on Stackoverflow
Follow me on Github
Follow me on Rss
Link to my Contact

Email subscription

Enter your email address to receive notifications of new posts.

Tags

.net agile agility android bash best practices blog cache cloud computing conference continuous integration css developer devoxx docker eclipse extreme programming firefox flagger google helm hibernate istio java job jug kubernetes london mobile computing overview performance plugin programmer script security sharing society spring tdd test tool ubuntu windows wordpress

Links

  • Blog Ippon Technologies
  • Blog Publicis Sapient
  • Blog Zenika
  • Classpert
  • CommitStrip
  • Coursera
  • Le Touilleur Express
  • Les Cast Codeurs Podcast
  • OCTO talks !
  • The Twelve-Factor App

Categories

  • Event (15)
  • Linux (3)
  • Management (8)
  • Agile programming (29)
  • Technology (45)

Archives

  • December 2022 (1)
  • April 2021 (1)
  • June 2020 (1)
  • May 2020 (2)
  • July 2019 (1)
  • May 2019 (1)
  • December 2018 (1)
  • October 2018 (1)
  • June 2018 (1)
  • May 2018 (1)
  • January 2018 (1)
  • May 2017 (1)
  • March 2017 (1)
  • October 2016 (1)
  • April 2016 (2)
  • March 2016 (1)
  • November 2015 (1)
  • May 2015 (1)
  • February 2015 (1)
  • December 2014 (1)
  • November 2014 (1)
  • September 2014 (2)
  • August 2014 (1)
  • July 2014 (2)
  • June 2014 (1)
  • April 2014 (1)
  • March 2014 (1)
  • February 2014 (2)
  • January 2014 (1)
  • December 2013 (1)
  • November 2013 (1)
  • October 2013 (3)
  • September 2013 (5)
  • July 2013 (1)
  • June 2013 (1)
  • May 2013 (1)
  • April 2013 (1)
  • March 2013 (2)
  • February 2013 (1)
  • January 2013 (2)
  • December 2012 (2)
  • October 2012 (1)
  • September 2012 (1)
  • July 2012 (1)
  • May 2012 (1)
  • April 2012 (1)
  • March 2012 (1)
  • February 2012 (1)
  • January 2012 (2)
  • December 2011 (1)
  • November 2011 (2)
  • October 2011 (2)
  • September 2011 (1)
  • July 2011 (1)
  • June 2011 (2)
  • April 2011 (1)
  • March 2011 (1)
  • February 2011 (1)
  • January 2011 (2)
  • November 2010 (2)
  • September 2010 (1)
  • August 2010 (1)
  • July 2010 (1)
  • June 2010 (1)
  • May 2010 (1)
  • April 2010 (1)
  • March 2010 (1)
  • February 2010 (1)
  • December 2009 (1)
  • November 2009 (1)
  • October 2009 (2)
  • September 2009 (2)
  • August 2009 (3)
  • July 2009 (1)
  • June 2009 (2)
Follow me on Twitter
Follow me on Linkedin
Follow me on Stackoverflow
Follow me on Rss
Link to my Contact
Follow me on Github
 
Fabian Piau | © 2009 - 2023
All Rights Reserved | Top ↑