Editor’s Note: The following is an article written for and published in DZone’s 2021 Application Performance Management Trend Report
Performance in any cloud-distributed application is key to a successful user experience. Thus, having a deep understanding of how to measure performance and what metric IO pattern to use is quite important. In this article, we will cover the following:
When improving the performance of your application or infrastructure architecture, use the following checklist:
The most important item from the performance checklist is identifying anti-patterns. This will save valuable time once you already know the weak spots.
Imagine you have a microservice that is deployed as a Docker container, and it is eating more CPU and memory than other containers. That can lead to outages since other services might not receive enough resources. For example, if you use Kubernetes, it may kill other containers to release some resources. You can easily fix this by setting up CPU and memory limits at the very beginning of the design and implementation phases.
Some applications that tend to work under a high load do not contain any caching mechanisms. This may lead to fetching the same data and overusing the main database. You can fix this by introducing a caching layer in your application, and it can be based on a Redis cache or just a memory cache module. Of course, you don’t need to use caching everywhere, since that may lead to data inconsistency. Sometimes, you can improve your performance by simply adding output caching to your code. For example:
public class HomeController : Controller
public ActionResult Index()
Above, I’ve added the output cache attribute to the MVC application. It will cache static content for 60 seconds.
This issue is often found in modern microservices architectures when all services are in a Kubernetes cluster and deployed via containers but they all use a single database instance. You can fix this problem by identifying the data scope for each microservice and splitting one database into several. You can also use the database pools mechanism. For example, Azure provides the Azure SQL elastic pool service.
Retrying storms and the issues they cause usually occur in microservice or cloud-distributed applications; when some component or service is offline, other services try to reach it. This often results in a never-ending retry loop. It can be fixed, however, by using the circuit breaker pattern. The idea for circuit breakers comes from radio electronics. It can be implemented as a separate component, like an auto switch. When the circuit runs into an issue (like a short circuit), then the switch turns the circuit off.
To check the performance of your architecture, you need to run load tests against your application. Below, you can see an architectural example that is based on Azure App Services and Azure Functions. The architecture contains the following components:
Here, you can find an example of how to set up a JMeter load test in Azure DevOps.
This is an excerpt from DZone’s 2021 Application Performance Management Trend Report. For more: Read the Report
As you can see in the diagram, everything looks good at first glance. Where do you think there could be an issue?
You can see the fixed architecture of the diagram in Figure 2.
First of all, you should never run load testing against the Production stage as it may (and often will) cause downtime when you run an excessive load test. Instead, you should run the test against the Test or Staging environments.
You can also create a replica of your production environment explicitly for load test purposes. In this case, you should not use real production data, since that may result in sending emails to real customers!
Next, let’s look at an application that should be architected under a high load.
Imagine that we have a popular e-Commerce application that needs to survive Black Friday.
The application architecture should be based around the following components:
In the figure above, you can see that the AKS cluster contains nodes that are represented as virtual machines under the hood. The Autoscaler is the main component here. It scales up the node count when the cluster is under high load. It also scales the node down to a standard size when the cluster is under normal load.
Istio provides the data plane and control plane, assisting with the following:
Please note that the architecture includes the stages Dev, Test, Staging, and Production, of course. The formula for the highly available Kubernetes cluster is having separate clusters per stage. However, for Dev and Test, you can use a single cluster separated by namespaces to reduce infrastructure costs.
For additional logging reinforcement, we used Azure Log Analytics agents and the Portal to create a dashboard. Istio contains a lot of existing metrics, including those for performance and options to customize them. You can then integrate with a Grafana dashboard.
Lastly, you can also set up a load test using Istio. Here is a good example.
Let’s start off by listing some of the most popular open-source application performance tools:
There are tools like Datadog the at have community licenses or trials. Also, if you are using Azure, you can enable Azure AppInsights with low cost or no cost at all, depending on bandwidth.
In this article, we’ve dug into common performance mistakes and anti-patterns when working with distributed cloud-based architectures, introducing a checklist to consider when building applications that will face heavy loads. Also, we explored the open-source tools that can help with performance analysis, especially for projects on limited budgets. And, of course, we’ve covered a few examples of highly performant applications and an application that may have performance issues so that you, dear reader, can avoid these common performance mistakes in your cloud architectures.
Originally published at https://dzone.com.
Boris Zaikin, Senior Software, and Cloud Architect at IBM, Nordcloud
Senior Software and Cloud Architect, IBM, Nordcloud@boriszn