Vendor lock-in refers to a situation where the cost of switching to a different vendor is so high that the customer is essentially stuck with the original vendor
‘Vendor lock-in’ has been around since the beginning of the hardware and software industries when companies had to choose their mainframe provider for the next decade, when consumers had to choose their external storage (remember the Zip drive?) or when users had to use iTunes to buy and transfer music to their iPhones.
Now in the Cloud era, this ‘vendor lock-in’ dilemma exists for each SaaS we integrate with, either we embrace it or fight it. Many development architects have discussed this generic subject, but barely none specifically for vendor lock-in for monitoring systems. I would like to discus that by comparing managed monitoring systems with the leading unmanaged open source Prometheus and Grafana stack.
Ease to onboard
When you are a startup in a growing stage or a mature company, you will always try to use managed services so you could focus on your core business development, so outsourcing your monitoring system makes sense. We see that need exists when we look at the growth in market value for the leaders in that domain.
When your worldwide deployment footprint growth over several IaaS projects and Kubernetes clusters, metrics pushed to or scraped by the managed service allows you to have a global visibility of your deployments by creating generic dashboard filtered by deployment or by creating generic alerting rules.
Managed services continuously release additional features on top of the core metrics management that helps improve our observability, such as network performance monitoring or distributed tracing.
The business model of those systems are billing by usage and cost can easily go out of control when your service topology grows (i.e. creating or scaling out a deployment) or when developers update metric data (i.e. creating a new metric and adding new label to an existing metric), thus increasing the cardinality because a time series is created and stored separately for each metric with a specific set of static labels.
Managed services in general will do their best to get you “stuck” on their products by delivering new features that could make your job better or faster. See below how DataDog created a “one-stop shop” for monitoring instead of using several managed or unmanaged services making very difficult to switch over another managed service or set of managed and unmanaged services.
Things to do if you (still) want to use a managed monitoring system
If you have decided to embrace the vendor lock-in, there are a few things you can do to keep a certain level of independence so you could switch over easier to another managed or unmanaged service(s) in the future.
Choose a managed monitoring system that fits the modern needs and that has been reviewed has one. You could review the leaders and visionaries in the latest Gartner Magic Quadrant for Application Performance Monitoring. This will allow you to ensure your observability is not left behind technology or standards improvements.
Monitor as code
Manage your monitoring by storing settings and configuration in source code, and by deploying them to the managed service using API calls or Terraform plans. This will allow you to have a change management and to build pipelines for automatic deployment.
Try to avoid managed monitoring systems that are using a proprietary query language (i.e. GCP MQL) and choose one that is compatible or close enough to PromQL, see PromLabs vendor compatibility. This will allow you to have more documentation and to have more people in your team maintaining it.
Prometheus stack make an extensive usage of labels allowing to “tag” rules and to handle alerts based on those labels for notification and silencing, so choose a managed system that can mimic that. This will allow you decouple evaluation and notification logic.
The alternative: use a managed Prometheus service
Amazon is now offering AMS (Amazon Managed Service for Prometheus) and Google will soon release GCMS in 2022 (Google Cloud Managed Service for Prometheus) that allow you that manage them with plain Prometheus configuration files and usually with IaaS related free metrics (i.e. GCE and GKE metrics). This will allow to decrease your vendor lock-in level in case you are willing to part with the additional features offered by other managed services.