How to write a blog post? At least my way!

Fabian Piau | Monday December 5th, 2022 - 10:28 PM

How to write a blog post? I have received a few messages from different readers asking me around this topic.

How do I find ideas?
How do I clear my thoughts before writing?
Do I follow a particular process?
Do I have any advice that I could share with someone that would like to start blogging?

I still remember the day I started this blog, nearly 13 years ago (as I am writing this, I am realising how prehistoric this sounds). At that time, I had this goal of posting once a month on various technical related topics whatever the idea was originating from, either from work or from personal projects. If it worked pretty well at start, lately, let’s face it, it does not work that well. I feel almost ashamed when I look at the date of the last (previous) article, more than a year ago, this is by far the longest period of inactivity on my blog!

Today, I have decided I will be writing on this particular topic, not only because I have been asked a few times but also because this article represents a milestone. Indeed this is my 100th article so I wanted to write on something slightly different than usual and this feels like the perfect topic.

Photo Write Article Laptop

To be honest, when it’s not part of your full-time job, it takes a fair amount of time and dedication to write articles, and I don’t think there is a magical recipe. Everyone’s mindset works differently, and what can work for me could have the opposite effect on you. In this article, I will try to describe the recipe that works for me. I have applied it to write this article so you will be the best to judge.

Idea

It all starts with an idea (it sounds like a movie trailer I know but it’s true). An idea is a single sentence, no more than a set of keywords, that represents a particular topic, not too specific but not too broad. It can come up while working, while doing tech-related things on my personal computer, while walking, while showering, there is no limit. An idea will usually come up a while before you start to write any content on it, but it’s not always true, and you could find an idea you will be passionate about to write an article in the next hour.

I keep all my ideas as an unordered bulleted list in a note file on my computer. To give you an idea (yes it was an easy joke sorry), I currently have 9 ideas in this list.

Thorough the year, I have this pool of ideas that I can pick up from. As I mentioned previously, there are no rules when picking up an idea, sometimes I write an article on an idea that I just added to the pool (this article is one of them), sometimes I write an article on an idea that I added 3 years ago. Most of the ideas I have now will probably never be transformed into an article, so that’s normal to discard quite a lot of them over time for various reasons, e.g. if the idea is:

not good enough to make an interesting post ;
not aligned with the content of my blog, i.e. targeting the wrong audience ;
about a very niche topic ;
or, on the opposite, many bloggers already got it covered.

Raw Content

Once I have selected the idea, I find 1 or 2 hours and I start writing. I need some focused time to do this part with as little break as possible. I am talking about writing pure content, a really drafted and raw version of the content, everything I could think of about the idea that I want to talk about, it can be in the form of full sentences, bulleted lists, a few raw links or textual description of the images and diagrams I would like to include, etc.

It depends on the idea, but if it’s a very technical one, I may also do some preparation beforehand and use some notes that I took previously, e.g. it can be a couple of lines of code or other coding examples that I want to include. I can also do some extra bit of research on the Internet to know what is there already, and if there is anything else I did not think of. It’s not a bad thing to get some inspiration from other sources, of course as long as you are not paraphrasing or doing any form of plagiarism!

At this stage, I am not paying attention to spelling, typos and other grammatical mistakes, I am literally writing what comes to my mind and that I would not want to miss out. I also use a simple text editor rather than using the online blog editor to avoid any distraction.

You may not be affected by this, but since my blog is bilingual, available in French and English, I also need to decide in which language I want to write first. In my opinion, it does not make sense to do both languages in parallel, and I prefer to focus on one at a time leaving the translation at the end. Most of the time, I end up writing in English first even if French is my mother tongue. The main reason is because I could write too many complex sentences in French that won’t be easily translatable into English, another reason is the fact I am working in technology in an English environment so English feels simply more natural (I really never think I would write that one day!).

Review, Improve, Polish, Repeat

Now that I got the idea and all the associated information I want to expose and shed light on, you would think I am pretty much at 60% of the writing process. Unfortunately, this is far from that, and realistically I would say I did about 30%.

This is when an iterative, slightly tedious, reviewing & improving process starts. It’s the longest part of writing an article, this is when I think about the structure of the article. I reorganise my thoughts in a logical way so the content will be easy to read and follow by my readers. The return lines, paragraphs, sections, titles and subtitles will naturally appear at that review stage. It’s totally fine to remove some text if you realise it doesn’t fit with the rest, or elaborate on some section if some details were missed, or even rewrite entire paragraphs.

It’s important to adapt the tone of voice and keep it consistent across all your articles. If it’s your own blog, it’s something you won’t even think of, but imagine you are writing an article in a scientific blog that has multiple authors, writing a joke is probably the last thing you would want to do…

Note that I stop using a simple text editor in favour of the online blog editor, so I can use the preview mode and see the final rendering of the article. This is important as this part is not focusing on the text only, but I also think about the images I would like to integrate, the links I would like to include and the formatting I would like to apply.

Visuals

When adding images, don’t overdo it. Adding an image for each section will impact the network bandwidth and cause some distraction to your readers. In most cases, an image at the top of the article will be enough.

To look for the perfect image and if you are not satisfied with your own photos, you could be tempted to use Google Images, but there are other resources available, they will also help you to avoid any form of copyright infringement.

You may need to create a diagram or a schema. I personally use PowerPoint, it may not be the best tool out there but it has a rendering that I like and I am very familiar with Office. In case I need to generate a more technical diagram, I also use PlantUML. It will really depend on what you are writing on and your field of expertise, in short, choose the tool you prefer!

Links

When adding links, don’t overdo it. Linking is the fundamental tool to navigate the Internet, it helps referencing every websites including your own, but adding many links will be distracting and your reader may end up thinking you are advertising rather than providing accurate information.

When adding links, ensure they all have a purpose. Do they help the reader to have more context? Do they let the reader access a tool that you have mentioned? And it’s very important to label your link properly and not use the typical “click here” that does not carry any context.

On the other end of the spectrum, it’s fine if your article does not contain any link or only a few. However, I would suggest to have a section at the bottom of the article with a list of useful links. This can be for reference (also a nice way to thank the authors you may have got some inspiration from) but this can also be useful if the reader wants to carry on more reading on the topic.

Formatting

When adding formatting, don’t overdo it. Consistency and simplicity are key concepts.

You can use multiple levels to differentiate main titles and subtitles in your article. I personally limit myself to 2 levels maximum. If you need more, maybe your article is too complex and you may want to break it down into a series of articles instead.

In general, you should space your text and avoid long and complex sentences. Use bulleted lists where it makes sense, not everywhere.

Some minor text formatting like bold and italic can be useful.

Bold can emphasise an idea and grab the reader’s attention.
Italic can be used to introduce a technical term or when quoting a text.

Last Review

At this stage, it’s important to use the preview mode of your blog editor, so you can check that there are no broken links, the images are rendered properly at the right place with the right dimensions, the formatting and spacing is working as expected, etc.

I have to admit I am using some help from Word for my spelling and grammar, what I do is copy the full content to a Word document, and then impact the potential suggested fixes back into the article. A more sophisticated tool like Grammarly or Antidote can also be a great help, and they integrate directly into your browser so you don’t need to transfer the content to an external tool.

If you know someone willing to help reviewing your article, it could be a colleague, a friend or a family member, ask them! It’s always good to have another pair of eyes on your work. Personally, I skip that step and do my own review. But not doing it can be slightly risky, especially if you are posting on behalf of your company for instance. I am pretty sure I have many articles published with typos, this is especially true in the English version of my old articles (hopefully my English is better now). Again, that is not the end of the world, that happens on an individual blog, you already took some time to write and share your knowledge, and your reader won’t judge you for that.

As the title suggests, this is a last review, so don’t review over and over. I know it can be tempting to improve your text again. I think it can be better like that. I think I went into too much details in this part and not enough in that part. The wording is not great here. What if I use this sentence instead? Let’s be honest, there is little chance that you will be fully satisfied.

Since you have already spent hours on your article, it’s time to let it go and click that publish button! Ah, last but not least, don’t forget to find a good title, it needs to be short, catchy and convey the main idea of your article.

Optional Translation

In my case, I am not done and I can’t publish just yet. I also need to translate the article into French before.

Although French is my mother tongue, I take some shortcut to speed up the translation process and I use Google Translate. So I can start from a text already translated that I can review, and potentially reformulate a few sentences here and there. The automatic translation is not perfect, but I have to say that it is pretty good, I would say that about 70% of the content is usually left untouched or with minor changes.

Unless I have to recreate some diagrams in French, the translation is probably the quickest part to do, since I keep the same images, text and formatting, and French is obviously easier for me…

Process write article

Happy blogging!

Comments: No Comments »
Categories: Management
Tags: blog, best practices, sharing, society, write
Comments rss

Bot Attacks: You are not alone…

Fabian Piau | Tuesday April 20th, 2021 - 04:42 PM

Version française disponible

Note
I originally published this article on the Medium Expedia Group technology blog: Bot Attacks: You are not alone…

In a recent article, Security Magazine stated that 1.3 billion bot attacks were detected in Q3 2020. It’s not a surprise. We are not alone. You are not alone.

Last year, the Opex Excellence team at Expedia Group did a review of the incidents in production that impacted the Hotels.com website. We covered a 12-month period, and bot attacks are one of the most important threats we identified.

Anything that can go wrong will go wrong — Murphy’s law

In this article, I will define and explain what bots and bot attacks are. I hope it will help you in building up some knowledge and getting a better understanding of the threat your website could be facing, so you can come up with a set of strategies and solutions to prevent and mitigate a future bot attack.

Know the enemy and know yourself, in a hundred battles you will never be in peril — Sun Tzu, The Art of War

What are bot attacks?

The first type of bot attack that usually comes to mind are Denial Of Service attacks (DoS) or DDoS (when it’s distributed). A DDoS attack usually impacts the whole website, with failures happening in cascade, making it unavailable. The attacker sends a large number of requests (e.g. perform multiple GET/POST on random URLs) in order to overload the underlying services so much that they cannot handle the traffic anymore. Services will start to respond to requests with timeout or error responses. This obviously also affects the requests originating from legitimate customers — customers that are trying to book their hotel stay, in our case.

Another type of attack, slightly more targeted, are Scraping attacks. A crawler bot can look into various pages in order to extract specific pieces of information. This is also known as Data scraping. They usually target your inventory data, e.g. all the properties details and contact information, or pricing details for specific dates. Sometimes it could even be the whole contents of some pages of your website, in order to replicate them and prepare a phishing attack.

ℹ️ Ideally, the attacker does not want to be seen while scraping a website and will try to span the attack over multiple days to avoid a surge in the traffic that may trigger some alert and investigation on your side. Unless the attack is sophisticated and coming from different machines, you can usually figure out that the website is being scraped when you start to see an abnormal number of requests coming from the same IP address (or range of IPs) at the Edge level. I will give more details about the Edge level in the next part.

If DoS and Scraping attacks are quite general in scope, in the sense they are targeting various pages across the website, we also see attacks against specific pages.

Specialized attacks are much more narrow in scope, and they usually target a specific page or feature of your website. Below are a few real cases that we have been faced with:

An attack on the Booking page, when the attacker tries to break the coupon field by generating and trying multiple codes, hoping there is an exploitable pattern to the codes. This is using brute force.
An attack on the Sign in page, when the attacker tries multiple login and password combinations. The idea is to get access to user accounts and confidential data, or what we call Account Take Over (ATO). This is also using brute force.
An attack on the Mobile App page, when the attacker tries to send large numbers of SMS to random or fake phone numbers. It is quite typical nowadays to prompt users to download an app to their mobile device by sending an SMS with the link to the app on the store. The targeted website will end up paying money as messaging service is usually handled by a third party vendor and each SMS sent has a cost.

General or specific, an attack can be basic or sophisticated. An example of a basic attack is someone trying desperately to find a discount code firing dozen of requests to a coupon service. Basic attacks are difficult to spot but easy to block: you can block the IP address with a WAF (Web Application Firewall) rule. They also have a low risk profile.

On the other hand, with sophisticated attacks, they can be distributed and spanned over thousands of machines located in different countries and using advanced scripting technologies, like headless browser. The level of impact and risk is much higher but they are obviously easier to spot when associated with a traffic surge.

Bot attack risk matrix

Should we block every bot?

Definitely not! Not all bots are evil; some are even beneficial to us.

There are the Spider bots (or Crawlers) from popular search engines like Google Search or Microsoft Bing. If we block those, the indexing of the website will be badly impacted, the legitimate traffic will degrade over time, and the website popularity will go down.
There are also the Commercial bots (e.g. Google AdSense bot), to provide personalized ads to the users, including the ads for our own website.
There are the Data bots like content aggregators and feeds, and there is also the Archive bot that is building the biggest internet Archive.
There are the Copyright bots, that look for plagiarism or intellectual property theft. You may have faced one of them if you’ve tried to upload a video on YouTube with your favourite music in background. You certainly quickly realize Google took it down, reminding you politely that you cannot use any protected material.
There are also the Monitoring bots to make sure your website is healthy, and raise some alert in case it’s not. For example, Akamai, Datadog, and so on use bots to make sure your site is responding properly.

ℹ️ You should not block any of these bots because they are not bad, and usually they are not aggressive, plus they contribute to the Internet. If you feel your legitimate traffic is suffering from them, then instead of blocking, it’s best to have a tarpitting or rate limiting strategy in place to mitigate their impact. More details about this in the next part about the Edge level.

If all these bots are third party, sometimes you can have your own. In our case, an internal bot is regularly checking our landing pages to make sure there are no dead links or unnecessary redirections. So we definitely don’t want to block it!

What can we do?

We can prevent attacks at the Edge level and mitigate them at the Application level.

Edge and Application levels

Use of robots.txt

The first thing that comes to mind when dealing with bots is the robots.txt file. Every website has it and it has been there for ages. This is a text file accessible publicly at the root of your website. It specifies the rules for any bots accessing your site. These rules define which pages the bots can and can’t crawl, and which links they should and shouldn’t follow.

Good bots will follow these rules. For instance, if a website owner doesn’t want a certain page on their site to show up in Google search results, they can write a rule for it, and Google web crawler bots won’t index that page. Although the robots.txt file cannot actually enforce these rules, good bots are programmed to look for that file and follow the rules before they do anything else. It’s based on a code of honor.

Malicious and bad bots will obviously not follow any of your rules. On the opposite, they will often read it to learn what content a website is trying to keep off-limits from them, then access that content. Thus, managing bots requires a more active approach than simply defining the rules for bot behavior in the robots.txt file. This is what we are going to see in the next part.

ℹ️ The robots.txt file can also be used to set up a ‘honeypot’. A honeypot is a fake target for bad actors that, when accessed, exposes the bad actor as malicious. In the case of a bot, a honeypot could be a page on the site that’s forbidden to bots by the robots.txt file. Good bots will read the robots.txt file and avoid that page, some bad bots will crawl the page. By tracking the information of the bots that access the honeypot, bad bots can be identified and blocked. Source: Cloudflare

Advanced Bot Management

The main shield against bad bots is Bot Management at the Edge level. This is much more advanced than the robots.txt file. I took this list from Cloudflare but it will be a similar set of features for any other Edge tool:

Identify bots vs. human visitors (using behavioral analysis and potentially machine learning)
Identify bot reputation
Identify bot origin IP addresses and block based on IP reputation
Analyze bot behavior
Add good bots to allowlists
Add bad bots to blocklists
Challenge potential bots via a CAPTCHA test, JavaScript injection, or other methods
Rate limit any potential bot over-using a service
Tarpit recognized bot requests (see definition below)
Deny access to certain content or resources for bad bots
Serve alternative/cached content to bots

ℹ️ ‘Tarpitting’ is an interesting feature. It means to add an artificial delay to the request. It is usually much better than blocking because the bot won’t know it has been discovered, but the attack will slow down significantly as fewer requests will reach the Application level, as they could time out at the Edge level. Rate limiting can also be another good strategy you may want to look at.

When deciding about an Advanced Bot manager, you can use a popular third party provider like Akamai or Cloudflare.

Pros

No impact on the application code.
A bot rule is relatively quick to deploy with immediate effect.

Cons

Most of the cons lie in the fact they are third party.
There is a license cost.
Bot rules can only be defined against generic and non business parameters like user agent, IP, endpoint, etc.
They sometimes involve a heavy approval process, e.g. adding a new rule will require people outside of the company plus internal people with special authorization.
A new rule can have side effects. Unless blocking a unique IP address, it’s very hard to be sure it won’t prevent some legitimate traffic from getting in.
The maintenance of the rules can be cumbersome over time.
Partial access and visibility for application teams.

Most of the disadvantages can be mitigated if you are using your own in-house Edge tool. This is particularly interesting when used in addition to a third party Edge tool. It’s obviously not something every company can invest in, but it will give you much more flexibility.

Ability to set rules related to your business.
Ability to add some one-off temporary rule to mitigate an attack that you can delete shortly after the attacks passed.
Ability to centralize the Edge monitoring and make the information available to every team.

Traffic prioritization

The idea here is not about replacing your main Edge tool but to add some bot logic after it. In a nutshell, such a tool will act as a prioritization queue so low value bot requests are deprioritized in favor of real user requests that have higher business value, e.g. ending up with potential booking in our case.

User requests > Internal bot requests > External good bot requests

ℹ️ Netflix is applying some similar concepts that you can read in Keeping Netflix Reliable Using Prioritized Load Shedding.

Caching

We talked about Bot Management but there is something else that the Edge layer can provide for you, and that is the ability to cache content. We usually refer this to a Content Delivery Network (CDN). The idea is to serve cached pages to good bots rather than generate fresh pages.

We did a Proof of Concept last year and it significantly decreased the traffic to our Landing services without affecting the SEO of the website. This year, we are looking at generalizing this approach.

Mitigating at the Application level

Managing bots at the application level means that a bot attack was able to pass through the higher level of protection on the Edge.

The solutions we have at this level are only mitigation solutions in order to be proactive and reduce the burden of an attack.

On a sad note, we know it’s going to happen, and even multiple times, so we need to prepare for it. On a brighter note, what we know is that bot attacks do not last forever. A sophisticated attack costs a significant amount of resources for the attacker, and as resources cost money. As long as we make the attacker spend more value than it is getting, we are certain the attack will be stopped eventually as a bad investment.

There are different actions we can do and most of them are good practices.

First is when coding the application logic, we can avoid high complexity code, blocking threads, and so on. A good idea is to separate application and management ports. If you use the same port, in case of a bot attack the service will be so overloaded that it won’t be able to respond to health checks and your infrastructure platform will flag it as unhealthy. Even if it does not solve all your issues, having a separate port can mitigate this.
Having a chaos mindset is important. For critical services, make sure you have load and stress testing in place in your pipeline. This is to ensure your services are resilient enough, and that you have identified potential memory leaks in your code and bottlenecks reaching downstream services or data sources. In case something goes wrong, you still want to serve a degraded response to mitigate the impact to the customer. You could also have some caching mechanism in place.
Make sure you leverage your infrastructure. If you use Kubernetes, you can take a look at auto-scaling. Be vigilant when enabling it. Ensure the configuration is well thought out and in line with your dependencies. Setting up a high number of pods and consider it as done will be a mistake, as you will also share the load with your downstream dependencies and, if they are not prepared for it, you will basically shift the bottleneck deeper in the stack without solving it. It may also cost you more money, if your infrastructure is hosted on a Cloud provider like AWS. Also make sure your pods are ready to take traffic once they are exposed to the attack. A warm up routine like Mittens will support you, especially for applications that are slow to start up.

There are also other strategies at the application level that are not related to configuration and infrastructure. Some mimic a bot management solution at the Edge level:

Captcha mechanism. A common way is to display a captcha to the user if there are too many attempts, or to target account-related attacks.
Authentication mechanism. If your APIs are public, you may want to add some authentication, from ‘Basic Auth’ to ‘CSRF Token’. But you should be aware it will add some complexity to your system, and you have to balance it with the information your API provides, e.g. ask yourself if the content exposed is sensitive enough.
Caching, Blocking, and Rate limiting mechanism. This may be quite complex to achieve and maintain, especially in a micro-service architecture, but I prefer to mention it because it could be a potential solution if you don’t have any Edge tool or if you are working on a monolithic app.

Observability, Monitoring and Alerting

Last but not least, it’s very important that you have proper observability in place. Any bot attacks that start to get heavy and put high pressure on your system should trigger a set of alerts automatically.

At the Edge level:

Alert on the bot rules when a rule created recently is blocking too much traffic or, on the opposite, when a rule has not blocked any traffic for the last ‘n’ months and can be reviewed for potential deletion.
Alert when the bot traffic (good or bad) is much higher than usual.

At the Application and Infrastructure level:

Alert on auto-scaling and number of instances. E.g. when Kubernetes is spinning up 5 new pods in the last 5 minutes and it’s not Black Friday, there is probably something fishy…
Alert on response time and status when the service starts to respond slowly and/or with errors. I recommend you to read Creating Monitoring Dashboards, which covers all you need to know about monitoring.
You can also set up some alerts at the log levels. This can be useful if you are missing some metrics in your application, and in case you are using an advanced log management tool like Splunk.

The last word

I hope this article was useful and you now have better knowledge of bots and bot attacks.

I did not discuss any silver bullet tool here, because there is no such thing as a single, perfect anti-bot tool for everyone. But there is always room for improvement to prevent and mitigate bot attacks.

Ah… And good luck for the next one! 👋 🤖

Comments: No Comments »
Categories: Technology
Tags: attack, best practices, incident management, bot, bots, security
Comments rss