I’m working on a microservices architecture since quite a long time now and attend many conferences on this topic. I want to gather my experience in this article to give you some feedbacks and also advices. Each title is what you shouldn’t do followed by a description of what you should do instead. Even if you are already doing microservices on your project, it can still be a nice reading (I hope) and refresh your memory. Also, don’t hesitate to comment and share your thoughts!
You can call it SOA (or call it whatever you want), but I prefer to use the appropriate buzzword for the SEO of my blog. ;)
Adopting a microservices architecture is not that easy. Writing microservices is not only about code, it is also about the teams and the structure organization.
Let’s extract this complex piece of code, it will be more simple in a microservice!
Usually, it is a bit easier when starting a new project from scratch, if it is your case, then you are lucky! If you want to break down a monolithic application into a set of microservices, don’t do everything in one go, the big bang is the best way to blow up everything. Try to gather the similar set of functionality that can be included as part of the same microservice. Make sure that all the features you want to extract are tested before refactoring anything. If the coverage is low, you have to start by writing tests. By doing that, you will improve the code from the monolith, and this code will be reused after in your microservice. Keep in mind you are preparing the extraction, this work won’t be useless.
If you don’t have enough tests, you will do things blindly and you won’t be able to ensure that you are not breaking anything, regression is the last thing you want. To recap, write integration tests around the code you want to extract, then you can start to extract some feature to create a microservice. Your integration tests should still pass, but the plumbing behind will have changed.
Start small and easy at first. And don’t be afraid of duplicating some code. Once your microservice is working fine, you can remove the duplicated code from the monolith. A/B testing can help, so you can gradually redirect the traffic to the new microservice and see how the system reacts.
The ‘micro-monolith-service’ trap
Keep the microservice quite small and not too complex. There is no magical answer to the question how many lines of code a service should have to be considered as a microservice? But it is more about feeling and great design. If a new joiner needs one day to understand the code and what your microservice is doing, then you probably get it wrong. A few hours should be more than enough.
You need to parse an XML file, let me write a microservice for that!
On the other extreme, don’t write a microservice when a library is enough, don’t over engineer things. You don’t want to have a call depth too big (number of nested requests). If you have more than 3 nested calls, you are probably doing nanoservices. A call involves network latency and a possible failure. A microservice will have to be deployed with an API, where a library is directly embedded without ops overhead. It is a balance between DevOps overhead and monolithic complexity.
Sorry, I can’t help you, I never work on this microservice…
It is not possible that all the teams are owning all the microservices. You will have to deal with synchronization and communication between your teams. I heard very often the fact that a team should not own a microservice, and everyone should be able to switch and work on any microservice. Well, in reality, it is not so true, but you can adopt some good practices to make it easier for someone to work on a microservice developed by another team.
In my opinion, a team owns a microservice, which means it is responsible for building and running it. When something wrong happens in production, the team should be the main point of contact. You will have to find the person within the team who will accept a call at 5am… Just kidding ;)
Spring, Dropwizard, Finagle, Clojure, let’s try and mix all!
Try to stick to the same technology stack throughout all your microservices. I know that a nice advantage of microservice is that you can build them with any language you want. Passionate developers will have a tendency to use the latest trendy framework. In the long term and with people turn over, maintaining your microservices will become painful and turns into a nightmare. Switching of microservices between the teams will become very difficult or even impossible. I think it is important to limit to a certain number of technologies.
Many technologies are available to do microservices, you have to use something reliable, well maintained, and so on. Which serialization, text or binary? REST, Thrift, SOAP? An in-house solution, open source framework? There is no best answer. Compare the different technologies, pros and cons, use the most suitable for you.
At the same time, this limitation shouldn’t prevent you to innovate. Don’t stick to what you already use and master if you think there is a better solution on the market. Try it on a new microservice (but non-critical to the system) as a POC, then in the case of success, you can propagate it to the whole system.
What about the data stores?
You can be more flexible about the data store. This is an external dependency, and it can be adapted depending on the needs (relational, noSQL, memory database, read-only access…).
Keep your data stores independent, each microservice is a data keeper. You should never bypass it by linking a microservice to the data store of another microservice. It will be an extremely bad design. In case you are limited to one database for some reasons, it’s fine because you can define different schemas and limit access for each microservice.
If one microservice fails, my whole system is down…
It is important to design a good architecture. Think about scalability, circuit breaker, service discovery from the start, not one month before going to production with thousands of users. You should always keep this in mind from day 0. Some people may argue that point, and sometimes you may have to convince the product owner that doesn’t see the real business value. But trust me, the more you wait, the more work it will be and you won’t be so confident with your architecture until all these technical tasks are complete.
You are dealing with many requests over the network, which is usually not very reliable, you have to imagine a design for failure. This is a mindset the team has to adopt. You have to think about retries (e.g., in the case of failure, you retry on the second instance), idempotent calls and so on. Retry is not as easy as you can explain it, especially when you cannot make the call idempotent. At some point, you will have to fine-tune the retry configuration. E.g. try avoiding retries at the inner level request if there are already retries at the outer level, the timeout at the outer level should be more than the timeout at the inner level, etc. All your microservices should expose a set of health checks URL and other useful metrics that another monitoring tool will use (either on a pull or push model).
It is also important to test the infrastructure. What will happen if this instance of microservice goes down? You should do some Monkey testing (i.e. switch off some microservices randomly), the system should react positively and still be able to handle and serve the requests (in a degraded mode).
All right, let me ‘grep’ the log to see what happened yesterday… Wait… There are 50 log files here!
Having many microservices involve lots of interactions, so lots of logging. It will be difficult to debug in case something went wrong in production. Where was it? Which log file? Which environment? Ops can get lost easily… Try to do accurate logging, and keep a correlation identifier to be able to trace the complete call stack throughout the different log files.
Be proactive and don’t wait for the customer to complain. Track any stacktrace that can occur, raise alerts and constantly fix all problems until your logs are “cleaned” and contains only useful and accurate information. A log monitoring tool (e.g. Splunk, Kibana…) will make sense in a microservices architecture.
We cannot test, the microservice we depend on is still not ready
When writing integration tests, mock all external dependencies of the microservice. Especially, if a team is still developing a microservice you depend on. You don’t want to wait for them to finish. E.g. you can take a look at Wiremock. On the opposite, for the end-to-end tests, try to provide a good and fast tool to run them locally, the use of embedded servers is a good way to achieve that.
I already mention it, deploy often to production, especially at the beginning, and don’t wait 6 months before deploying your microservices. Even if they are not useful (yet) from a business value point of view, you will be more confident when working on them, and then you can implement the business logic without worrying too much about the architecture. The earlier you test in production, the earlier you will see bugs and be able to fix them quickly.
Let’s design the API like that for now, something simple, we will change it later
It is not easy to be agile when designing an API, especially a public one. Once a version is released and start to be used, any big change will involve an API contract change. You will have to think about versioning your API and ultimately maintain different versions. It is important to spend some time at the beginning to define a good API contract that can evolve without breaking changes (ideally). That sounds waterfall to you? Well, that probably is…
Always keep in mind to be backward compatible (even if the calls are internal) and use API versioning. For example, don’t rename a field like that, but follow a longer and safer process, you should add the new field, keep the old one deprecated until all the clients have migrated, then you can delete it.
The ops guys need one day to deploy a couple of microservices, what is wrong with them?!
The ops team has to change its mindset, instead of deploying and monitoring a big and unique application, they will have to manage many little applications and make sure there are no communication failures. From a couple of manual steps, they are ending up with many tedious steps. They will be quickly overwhelmed if they keep the same habits and spirit, especially the number of microservices will grow over time.
They will need to automatize their work and avoid any manual and error-prone step. Adopting a microservices architecture forces you to do “real” DevOps. The developer has to participate in the deployment, it is not about just giving a file anymore and counting on the ops to do the job. The developer has to be involved. Ideally, an ops guy will be dedicated to the team to make sure the network configuration is correct, the health checks are existing and working fine, participate to stand-ups and even seat with the dev team. You will need to start thinking about using a deployment tool such as Ansible.
We have spent 6 months on this architecture, it is working very well… It is just that we have 50 users!
Last advice, don’t try to do everything at once. You have to tackle with so many problematics already. You can put aside some stuff like self-healing, service discovery, circuit breaker, auto scaling, etc. The use of retries and multiple instances will be sufficient at the beginning. But keep in mind, you will have to implement them at some point, especially when your system is successful and used by thousands of users.