Agility, Java programming, New technologies and more…
  • rss
  • Home
  • Management
  • Agile Programming
  • Technology
  • Linux
  • Event
  • Android app
  • Contact
  • About the author
  • English
  • Francais

Bot Attacks: You are not alone…

Fabian Piau | Tuesday April 20th, 2021 - 04:42 PM
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Pocket

 Version française disponible

Note
I originally published this article on the Medium Expedia Group technology blog: Bot Attacks: You are not alone…

In a recent article, Security Magazine stated that 1.3 billion bot attacks were detected in Q3 2020. It’s not a surprise. We are not alone. You are not alone.

Last year, the Opex Excellence team at Expedia Group did a review of the incidents in production that impacted the Hotels.com website. We covered a 12-month period, and bot attacks are one of the most important threats we identified.

Anything that can go wrong will go wrong — Murphy’s law

In this article, I will define and explain what bots and bot attacks are. I hope it will help you in building up some knowledge and getting a better understanding of the threat your website could be facing, so you can come up with a set of strategies and solutions to prevent and mitigate a future bot attack.

Know the enemy and know yourself, in a hundred battles you will never be in peril — Sun Tzu, The Art of War

Bots - Photo by Eric Krull on Unsplash

What are bot attacks?

The first type of bot attack that usually comes to mind are Denial Of Service attacks (DoS) or DDoS (when it’s distributed). A DDoS attack usually impacts the whole website, with failures happening in cascade, making it unavailable. The attacker sends a large number of requests (e.g. perform multiple GET/POST on random URLs) in order to overload the underlying services so much that they cannot handle the traffic anymore. Services will start to respond to requests with timeout or error responses. This obviously also affects the requests originating from legitimate customers — customers that are trying to book their hotel stay, in our case.

Another type of attack, slightly more targeted, are Scraping attacks. A crawler bot can look into various pages in order to extract specific pieces of information. This is also known as Data scraping. They usually target your inventory data, e.g. all the properties details and contact information, or pricing details for specific dates. Sometimes it could even be the whole contents of some pages of your website, in order to replicate them and prepare a phishing attack.

ℹ️ Ideally, the attacker does not want to be seen while scraping a website and will try to span the attack over multiple days to avoid a surge in the traffic that may trigger some alert and investigation on your side. Unless the attack is sophisticated and coming from different machines, you can usually figure out that the website is being scraped when you start to see an abnormal number of requests coming from the same IP address (or range of IPs) at the Edge level. I will give more details about the Edge level in the next part.


If DoS and Scraping attacks are quite general in scope, in the sense they are targeting various pages across the website, we also see attacks against specific pages.

Specialized attacks are much more narrow in scope, and they usually target a specific page or feature of your website. Below are a few real cases that we have been faced with:

  • An attack on the Booking page, when the attacker tries to break the coupon field by generating and trying multiple codes, hoping there is an exploitable pattern to the codes. This is using brute force.
  • An attack on the Sign in page, when the attacker tries multiple login and password combinations. The idea is to get access to user accounts and confidential data, or what we call Account Take Over (ATO). This is also using brute force.
  • An attack on the Mobile App page, when the attacker tries to send large numbers of SMS to random or fake phone numbers. It is quite typical nowadays to prompt users to download an app to their mobile device by sending an SMS with the link to the app on the store. The targeted website will end up paying money as messaging service is usually handled by a third party vendor and each SMS sent has a cost.

General or specific, an attack can be basic or sophisticated. An example of a basic attack is someone trying desperately to find a discount code firing dozen of requests to a coupon service. Basic attacks are difficult to spot but easy to block: you can block the IP address with a WAF (Web Application Firewall) rule. They also have a low risk profile.

On the other hand, with sophisticated attacks, they can be distributed and spanned over thousands of machines located in different countries and using advanced scripting technologies, like headless browser. The level of impact and risk is much higher but they are obviously easier to spot when associated with a traffic surge.

Bot attack risk matrix

Should we block every bot?

Definitely not! Not all bots are evil; some are even beneficial to us.

  • There are the Spider bots (or Crawlers) from popular search engines like Google Search or Microsoft Bing. If we block those, the indexing of the website will be badly impacted, the legitimate traffic will degrade over time, and the website popularity will go down.
  • There are also the Commercial bots (e.g. Google AdSense bot), to provide personalized ads to the users, including the ads for our own website.
  • There are the Data bots like content aggregators and feeds, and there is also the Archive bot that is building the biggest internet Archive.
  • There are the Copyright bots, that look for plagiarism or intellectual property theft. You may have faced one of them if you’ve tried to upload a video on YouTube with your favourite music in background. You certainly quickly realize Google took it down, reminding you politely that you cannot use any protected material.
  • There are also the Monitoring bots to make sure your website is healthy, and raise some alert in case it’s not. For example, Akamai, Datadog, and so on use bots to make sure your site is responding properly.

ℹ️ You should not block any of these bots because they are not bad, and usually they are not aggressive, plus they contribute to the Internet. If you feel your legitimate traffic is suffering from them, then instead of blocking, it’s best to have a tarpitting or rate limiting strategy in place to mitigate their impact. More details about this in the next part about the Edge level.

If all these bots are third party, sometimes you can have your own. In our case, an internal bot is regularly checking our landing pages to make sure there are no dead links or unnecessary redirections. So we definitely don’t want to block it!

What can we do?

We can prevent attacks at the Edge level and mitigate them at the Application level.

Edge and Application levels

Use of robots.txt

The first thing that comes to mind when dealing with bots is the robots.txt file. Every website has it and it has been there for ages. This is a text file accessible publicly at the root of your website. It specifies the rules for any bots accessing your site. These rules define which pages the bots can and can’t crawl, and which links they should and shouldn’t follow.

Good bots will follow these rules. For instance, if a website owner doesn’t want a certain page on their site to show up in Google search results, they can write a rule for it, and Google web crawler bots won’t index that page. Although the robots.txt file cannot actually enforce these rules, good bots are programmed to look for that file and follow the rules before they do anything else. It’s based on a code of honor.

Malicious and bad bots will obviously not follow any of your rules. On the opposite, they will often read it to learn what content a website is trying to keep off-limits from them, then access that content. Thus, managing bots requires a more active approach than simply defining the rules for bot behavior in the robots.txt file. This is what we are going to see in the next part.

ℹ️ The robots.txt file can also be used to set up a ‘honeypot’. A honeypot is a fake target for bad actors that, when accessed, exposes the bad actor as malicious. In the case of a bot, a honeypot could be a page on the site that’s forbidden to bots by the robots.txt file. Good bots will read the robots.txt file and avoid that page, some bad bots will crawl the page. By tracking the information of the bots that access the honeypot, bad bots can be identified and blocked. Source: Cloudflare


Advanced Bot Management

The main shield against bad bots is Bot Management at the Edge level. This is much more advanced than the robots.txt file. I took this list from Cloudflare but it will be a similar set of features for any other Edge tool:

  • Identify bots vs. human visitors (using behavioral analysis and potentially machine learning)
  • Identify bot reputation
  • Identify bot origin IP addresses and block based on IP reputation
  • Analyze bot behavior
  • Add good bots to allowlists
  • Add bad bots to blocklists
  • Challenge potential bots via a CAPTCHA test, JavaScript injection, or other methods
  • Rate limit any potential bot over-using a service
  • Tarpit recognized bot requests (see definition below)
  • Deny access to certain content or resources for bad bots
  • Serve alternative/cached content to bots

ℹ️ ‘Tarpitting’ is an interesting feature. It means to add an artificial delay to the request. It is usually much better than blocking because the bot won’t know it has been discovered, but the attack will slow down significantly as fewer requests will reach the Application level, as they could time out at the Edge level. Rate limiting can also be another good strategy you may want to look at.

When deciding about an Advanced Bot manager, you can use a popular third party provider like Akamai or Cloudflare.

Pros

  • No impact on the application code.
  • A bot rule is relatively quick to deploy with immediate effect.

Cons

  • Most of the cons lie in the fact they are third party.
  • There is a license cost.
  • Bot rules can only be defined against generic and non business parameters like user agent, IP, endpoint, etc.
  • They sometimes involve a heavy approval process, e.g. adding a new rule will require people outside of the company plus internal people with special authorization.
  • A new rule can have side effects. Unless blocking a unique IP address, it’s very hard to be sure it won’t prevent some legitimate traffic from getting in.
  • The maintenance of the rules can be cumbersome over time.
  • Partial access and visibility for application teams.

Most of the disadvantages can be mitigated if you are using your own in-house Edge tool. This is particularly interesting when used in addition to a third party Edge tool. It’s obviously not something every company can invest in, but it will give you much more flexibility.

  • Ability to set rules related to your business.
  • Ability to add some one-off temporary rule to mitigate an attack that you can delete shortly after the attacks passed.
  • Ability to centralize the Edge monitoring and make the information available to every team.


Traffic prioritization

The idea here is not about replacing your main Edge tool but to add some bot logic after it. In a nutshell, such a tool will act as a prioritization queue so low value bot requests are deprioritized in favor of real user requests that have higher business value, e.g. ending up with potential booking in our case.

User requests > Internal bot requests > External good bot requests

ℹ️ Netflix is applying some similar concepts that you can read in Keeping Netflix Reliable Using Prioritized Load Shedding.


Caching

We talked about Bot Management but there is something else that the Edge layer can provide for you, and that is the ability to cache content. We usually refer this to a Content Delivery Network (CDN). The idea is to serve cached pages to good bots rather than generate fresh pages.

We did a Proof of Concept last year and it significantly decreased the traffic to our Landing services without affecting the SEO of the website. This year, we are looking at generalizing this approach.


Mitigating at the Application level

Managing bots at the application level means that a bot attack was able to pass through the higher level of protection on the Edge.

The solutions we have at this level are only mitigation solutions in order to be proactive and reduce the burden of an attack.

On a sad note, we know it’s going to happen, and even multiple times, so we need to prepare for it. On a brighter note, what we know is that bot attacks do not last forever. A sophisticated attack costs a significant amount of resources for the attacker, and as resources cost money. As long as we make the attacker spend more value than it is getting, we are certain the attack will be stopped eventually as a bad investment.

There are different actions we can do and most of them are good practices.

  • First is when coding the application logic, we can avoid high complexity code, blocking threads, and so on. A good idea is to separate application and management ports. If you use the same port, in case of a bot attack the service will be so overloaded that it won’t be able to respond to health checks and your infrastructure platform will flag it as unhealthy. Even if it does not solve all your issues, having a separate port can mitigate this.
  • Having a chaos mindset is important. For critical services, make sure you have load and stress testing in place in your pipeline. This is to ensure your services are resilient enough, and that you have identified potential memory leaks in your code and bottlenecks reaching downstream services or data sources. In case something goes wrong, you still want to serve a degraded response to mitigate the impact to the customer. You could also have some caching mechanism in place.
  • Make sure you leverage your infrastructure. If you use Kubernetes, you can take a look at auto-scaling. Be vigilant when enabling it. Ensure the configuration is well thought out and in line with your dependencies. Setting up a high number of pods and consider it as done will be a mistake, as you will also share the load with your downstream dependencies and, if they are not prepared for it, you will basically shift the bottleneck deeper in the stack without solving it. It may also cost you more money, if your infrastructure is hosted on a Cloud provider like AWS. Also make sure your pods are ready to take traffic once they are exposed to the attack. A warm up routine like Mittens will support you, especially for applications that are slow to start up.

There are also other strategies at the application level that are not related to configuration and infrastructure. Some mimic a bot management solution at the Edge level:

  • Captcha mechanism. A common way is to display a captcha to the user if there are too many attempts, or to target account-related attacks.
  • Authentication mechanism. If your APIs are public, you may want to add some authentication, from ‘Basic Auth’ to ‘CSRF Token’. But you should be aware it will add some complexity to your system, and you have to balance it with the information your API provides, e.g. ask yourself if the content exposed is sensitive enough.
  • Caching, Blocking, and Rate limiting mechanism. This may be quite complex to achieve and maintain, especially in a micro-service architecture, but I prefer to mention it because it could be a potential solution if you don’t have any Edge tool or if you are working on a monolithic app.


Observability, Monitoring and Alerting

Last but not least, it’s very important that you have proper observability in place. Any bot attacks that start to get heavy and put high pressure on your system should trigger a set of alerts automatically.

At the Edge level:

  • Alert on the bot rules when a rule created recently is blocking too much traffic or, on the opposite, when a rule has not blocked any traffic for the last ‘n’ months and can be reviewed for potential deletion.
  • Alert when the bot traffic (good or bad) is much higher than usual.

At the Application and Infrastructure level:

  • Alert on auto-scaling and number of instances. E.g. when Kubernetes is spinning up 5 new pods in the last 5 minutes and it’s not Black Friday, there is probably something fishy…
  • Alert on response time and status when the service starts to respond slowly and/or with errors. I recommend you to read Creating Monitoring Dashboards, which covers all you need to know about monitoring.
  • You can also set up some alerts at the log levels. This can be useful if you are missing some metrics in your application, and in case you are using an advanced log management tool like Splunk.

The last word

I hope this article was useful and you now have better knowledge of bots and bot attacks.

I did not discuss any silver bullet tool here, because there is no such thing as a single, perfect anti-bot tool for everyone. But there is always room for improvement to prevent and mitigate bot attacks.

Ah… And good luck for the next one! 👋 🤖

Comments
No Comments »
Categories
Technology
Tags
attack, best practices, incident management, bot, bots, security
Comments rss Comments rss

Flagger – Monitor your Canary deployments with Grafana

Fabian Piau | Sunday June 28th, 2020 - 06:07 PM
  • Print
  • Twitter
  • LinkedIn
  • Facebook
  • Pocket

 Version française disponible

Update
October, 17th, 2020 : Use newer versions (Helm 3, Kube 18, Istio 1.7, Flagger 1.2).

Flagger logo

This is the third article in our series dedicated to Flagger. In a nutshell, Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.

Make sure you have a local Kubernetes cluster running with the service mesh Istio. If you don’t, read the first article: Flagger – Get Started with Istio and Kubernetes. You also need to be familiar with Flagger and MHS, you will find all details in the second article: Flagger – Canary deployments on Kubernetes.

In this third guide, we will focus on the installation of Grafana for Flagger and how you can monitor your Canary deployments without having to use Kubernetes (i.e. Kube dashboard or kubectl command line tool).

Note
This is a hands-on guide and can be followed step by step on MacOS. It will require some adjustments if you are using a Windows or Linux PC. It is important to note that this article will not go into details and only grasp the concepts & technologies so if you are not familiar with Docker, Kubernetes, Helm or Istio, I strongly advise you to check some documentation yourself before continuing reading.


Installing Grafana

Flagger provides a Grafana dashboard out of the box to monitor all the canary deployments inside your cluster.

Let’s install Grafana in the namespace istio-system with a single command:

helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus:9090 \
--set user=admin \
--set password=changeme

Reference: Flagger monitoring
Flagger depends on Istio telemetry and Prometheus (I’ve assumed Istio is installed in the istio-system namespace).

After a few seconds, you should get a message confirming that Grafana for Flagger has been installed. From the Kubernetes dashboard, verify that the Flagger Grafana pod is running in istio-system.

Grafana for Flagger is deployed in your cluster

Grafana for Flagger is deployed in your cluster

To expose Grafana, run:

kubectl -n istio-system port-forward svc/flagger-grafana 3000:80

So you can access it with your browser at http://localhost:3000/d/flagger-istio/istio-canary. Use the login and password you specified before (admin / changeme).

Note that as we use Istio, we use the Istio Canary dashboard. Flagger is compatible with other service meshes and there are other dashboards available.


Running the canary deployments

If you have followed the previous article, make sure you select application for the Namespace and associate mhs-primary to Primary and mhs to Canary.

Then try again the different experiments we did in the previous article and monitor the Grafana dashboard at the same time, especially:

  • Experiment 1 – MHS v1.1.2 successful deployment
  • Experiment 2 – MHS v1.1.3 faulty deployment


Results

Look at this screenshot I took during Experiment 1 when the canary release was successful and the new version rolled out. I annotated it and added explanations, so it’s easier to understand the graphs.

A successful Canary deployment

A successful Canary deployment

And this is a similar screenshot for Experiment 2 when the canary release did not succeed and the new version was not rolled out.

A failed Canary deployment

A failed Canary deployment

Congratulations, you’ve come to the end of this third tutorial!


Cleaning up resources

You can delete the MHS application and its namespace. We can also remove Istio and Flagger because this is the last article of the series.

helm delete mhs --namespace application
kubectl delete namespaces application
helm delete flagger --namespace flagger-system
kubectl delete namespaces flagger-system
kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.7/samples/addons/prometheus.yaml
istioctl manifest generate --set profile=demo | kubectl delete -f -
kubectl delete namespaces istio-system

You can also stop the Kubernetes cluster by unchecking the box and restarting Docker Desktop.

Related posts

kubernetesFlagger – Canary deployments on Kubernetes kubernetesFlagger – Get Started with Istio and Kubernetes
Comments
No Comments »
Categories
Agile programming
Tags
cloud, docker, flagger, grafana, helm, istio, kubernetes
Comments rss Comments rss
Page 1 of 5012345…102030…50
Download CarmaBlog App

RSS feeds

  • RSS feed RSS - Posts
  • RSS feed RSS - Comments

Most viewed posts

  • Changing the language in Firefox - 115,350 views
  • Using Google Forms / Drive / Docs to create an online survey - 62,779 views
  • FAQ – Online survey with Google Forms / Drive / Docs - 51,002 views
  • Customizing Gnome 3 (Shell) - 29,718 views
  • The meaning of URL, URI, URN - 16,829 views
  • Java EE & CDI vs. Spring - 15,214 views
  • Open Street Map, better map than Google Maps? - 14,377 views
  • Comparing NoSQL: Couchbase & MongoDB - 13,900 views
  • Firefox Nightly, Aurora, Beta, Desktop, Mobile, ESR & Co. - 12,965 views
  • API, REST, JSON, XML, HTTP, URI… What language do you speak? - 12,401 views

Recent Comments

  • Fabian Piau on FAQ – Online survey with Google Forms / Drive / DocsBonjour, il n'y a pas de durée limitée pour un son…
  • SUNCAM SONDAGE on FAQ – Online survey with Google Forms / Drive / DocsBonjour je veux me lancer avec Google form dans un…
  • Laetitia on FAQ – Online survey with Google Forms / Drive / Docsabsolument... mais ça ne change rien...
  • Fabian Piau on FAQ – Online survey with Google Forms / Drive / DocsAvez-vous suivi la réponse à la première FAQ de ce…
  • Laetitia on FAQ – Online survey with Google Forms / Drive / DocsBonjour, J'ai créé un Google forms, mais il semble…

Recent posts

  • Bot Attacks: You are not alone… - 1 year and 2 months ago
  • Flagger – Monitor your Canary deployments with Grafana - 2 years and 3 days ago
  • Flagger – Canary deployments on Kubernetes - 2 years and 1 month ago
  • Flagger – Get Started with Istio and Kubernetes - 2 years and 1 month ago
  • Expedia CoderDojo in London - 2 years and 11 months ago
  • Volunteering at Devoxx4Kids - 3 years and 1 month ago
  • A Java 11 migration successful story - 3 years and 6 months ago
  • Tips to make your WordPress website secure - 3 years and 8 months ago
  • Devoxx UK 2018 – Day 2 - 4 years and 3 weeks ago
  • Devoxx UK 2018 – Day 1 - 4 years and 1 month ago
  • Wise, Revolut and Monzo, a small revolution for travelers and expats - 4 years and 5 months ago
  • Autocomplete for Git - 5 years and 1 month ago
  • Swagger, the automated API documentation - 5 years and 4 months ago
  • Microservices architecture – Best practices - 5 years and 8 months ago
  • FAQ – Online survey with Google Forms / Drive / Docs - 6 years and 2 months ago
Buy me a coffee

Language

  • Français
  • English

Follow me!

Follow me on Linkedin
Follow me on Twitter
Follow me on Stackoverflow
Follow me on Github
Follow me on Rss
Link to my Contact

Email subscription

Enter your email address to receive notifications of new posts.

Tags

.net agile agility android bash best practices blog cache cloud computing conference continuous integration css developer devoxx docker eclipse extreme programming firefox flagger google helm hibernate istio java job jug kubernetes london mobile computing overview performance plugin programmer script security sharing society spring tdd test tool ubuntu windows wordpress

Links

  • Blog Ippon Technologies
  • Blog Publicis Sapient
  • Blog Zenika
  • Classpert
  • CommitStrip
  • Coursera
  • Le Touilleur Express
  • Les Cast Codeurs Podcast
  • OCTO talks !
  • The Twelve-Factor App

Categories

  • Event (15)
  • Linux (3)
  • Management (7)
  • Agile programming (29)
  • Technology (45)

Archives

  • April 2021 (1)
  • June 2020 (1)
  • May 2020 (2)
  • July 2019 (1)
  • May 2019 (1)
  • December 2018 (1)
  • October 2018 (1)
  • June 2018 (1)
  • May 2018 (1)
  • January 2018 (1)
  • May 2017 (1)
  • March 2017 (1)
  • October 2016 (1)
  • April 2016 (2)
  • March 2016 (1)
  • November 2015 (1)
  • May 2015 (1)
  • February 2015 (1)
  • December 2014 (1)
  • November 2014 (1)
  • September 2014 (2)
  • August 2014 (1)
  • July 2014 (2)
  • June 2014 (1)
  • April 2014 (1)
  • March 2014 (1)
  • February 2014 (2)
  • January 2014 (1)
  • December 2013 (1)
  • November 2013 (1)
  • October 2013 (3)
  • September 2013 (5)
  • July 2013 (1)
  • June 2013 (1)
  • May 2013 (1)
  • April 2013 (1)
  • March 2013 (2)
  • February 2013 (1)
  • January 2013 (2)
  • December 2012 (2)
  • October 2012 (1)
  • September 2012 (1)
  • July 2012 (1)
  • May 2012 (1)
  • April 2012 (1)
  • March 2012 (1)
  • February 2012 (1)
  • January 2012 (2)
  • December 2011 (1)
  • November 2011 (2)
  • October 2011 (2)
  • September 2011 (1)
  • July 2011 (1)
  • June 2011 (2)
  • April 2011 (1)
  • March 2011 (1)
  • February 2011 (1)
  • January 2011 (2)
  • November 2010 (2)
  • September 2010 (1)
  • August 2010 (1)
  • July 2010 (1)
  • June 2010 (1)
  • May 2010 (1)
  • April 2010 (1)
  • March 2010 (1)
  • February 2010 (1)
  • December 2009 (1)
  • November 2009 (1)
  • October 2009 (2)
  • September 2009 (2)
  • August 2009 (3)
  • July 2009 (1)
  • June 2009 (2)
Follow me on Twitter
Follow me on Linkedin
Follow me on Stackoverflow
Follow me on Rss
Link to my Contact
Follow me on Github
 
Fabian Piau | © 2009 - 2022
All Rights Reserved | Top ↑