Join us
@eon01 ・ Apr 10,2023 ・ 9 min read ・ 1787 views
Claudie is a platform for managing multi-cloud Kubernetes clusters, with each nodepool in a different cloud provider. It supports cloud bursting, service interconnect, managed Kubernetes for providers that do not offer it, and cost savings.
Today, we are talking with Bernard Halas, the Lead Engineer of Claudie. Read on to learn more about the human and the code!
There was a consensus among my group of friends from the times of my university studies, that I would be a software engineer by profession, while they wanted to pursue other subdomains within computer science. As it turned out, I’m closer to the IT infrastructure fields nowadays and my friends do a lot of software engineering 🙂
The path had been set when I started an ISP company as a hobby during my studies, for a very pragmatic reason. In my hometown, there was a lack of broadband connectivity in the early aughts, so we created a community network that we connected by a fast uplink. With good traffic control, we enjoyed a higher throughput-to-cost ratio on our Internet connectivity. This started to grow and suddenly, we were a small ISP. Linux (and BSD) became my best friend. Vyatta (nowadays VyOS) helped us with running the network well.
Having this background, combined with a bit of luck, I was enjoying the work where the Software engineering and Infrastructure engineering fields meet – before it was called DevOps. When Kubernetes was born, we could see it solved many problems that were troubling companies running distributed and containerized systems. Nowadays, with my team, I’m pushing the boundaries of Kubernetes adoption, especially in multi-cloud scenarios.
Simply put, Kubernetes has been designed in a cloud-agnostic way. It’s important to understand what exactly that means. It means you can build and provision Kubernetes clusters in de facto any cloud or on-premises data centers. Kubernetes doesn’t have a preference or a stickiness to a particular infrastructure stack. However, what might not be clear for many people is that it becomes really difficult to migrate such clusters across cloud providers or into other physical on-premises data centers, should the need arise. Berops is a consultancy company. We have been coming across customers who are interested in overcoming this provider lock-in barrier. That’s how the concept of Claudie was born.
In many (if not most) cases nowadays, at the heart of modern infrastructure architectures there are Kubernetes clusters. This turned into a common phrase: that Kubernetes is the operating system of the cloud. The driving force behind Claudie is freedom. Claudie enables people to build clusters that have individual nodepools of the very same Kubernetes clusters located at different cloud providers. The system runs with a dynamic state reconciliation. Adding capacity from a new cloud provider or abandoning a provider altogether works with no downtime. As a result, people can migrate Kubernetes clusters between cloud providers under full production traffic load. Currently, we’re working on support for on-premises nodepools. This enables cloud-bursting, cost optimization, and Kubernetes on-Edge scenarios.
We’ve released Claudie as open-source to gain traction and adoption on an already consolidated market of Kubernetes installers. I assume many of you have seen the CNCF landscape and the amount of amazing and high-quality projects. That has led us to the model we plan to monetize on:
In the current phase, it’s commercial support for self-hosted deployments. From there, we want to build a PaaS for cloud-hosted scenarios. This means we need to cover areas like billing, invoicing, IAM and account management, multi-tenancy, etc. Still a lot of work ahead of us :)
Multi-cloud implementation was fairly straightforward. There is just one GCP, one Azure, and one OCI provider. There are interesting architectural challenges in the implementation of the on-premises nodepools. One of the reasons is that the on-premises world is very heterogeneous and there are different ways of how infrastructure is managed in private or rented data centers. We still have several difficult decisions ahead of us. One of them is what level of control do we need to assume over the on-premises machines? Another is which technique to use for establishing control over the on-premises nodes? Shall we use a reverse shell? How to resume the process after the network connectivity is interrupted? Will the customers need to run custom OS images? How to ensure that the process of the on-premises nodes joining the cluster is secure?
The difficult thing is defining the scope of what Claudie should be capable of and what scenarios we want to cover. By now, Kubernetes clusters are deployed in a large variety of configurations and environments. Supporting all of the possible scenarios would consume a tremendous amount of work, therefore we had to limit the scope to the most popular scenarios. That means, for example, that we don’t support IPv6-based deployments yet.
Claudie is targeting multi-cloud and hybrid-cloud scenarios. We don’t expect it to become the most popular Kubernetes management platform. It just solves problems some users have. The mid-term vision is to build a Claudie PaaS and to integrate with other vendors in the cloud-native ecosystem. With the hybrid-cloud mode, Claudie covers several Kubernetes-on-Edge strategies and we’d like to improve the feature set in this regard as well.
We welcome contributions from the community. Please have a look at our contributions guide. There’s an ongoing collaboration with the academic community as well. If you’re looking for an academic project, we have a few ideas which could be explored and prototyped.
Currently, we’re exploring the technical usability boundaries of Claudie and multi-cloud Kubernetes scenarios in general. For updates on these topics, please follow us via LinkedIn or Twitter, or come across our blog page every once in a while.
I started Berops when I had around 15 years of professional experience. And as such, it’s a reflection of the observations I made and the lessons I learned.
One of the early lessons I learned was probably a result of my internal human relationship experiment. When I was a newcomer to one of my earliest jobs, there was a person who disliked our whole team because of her perception of our team having poor delivery. Instead of blaming her for her childish behavior towards me, as I had little to no responsibility for that aspect being a newcomer, I took it as a personal challenge and tried to speak to her constructively on several occasions to understand what her perception was stemming from. I could eventually see she had a valid point. This allowed us to work together on several significant improvements in our deliveries. The interesting aspect of this was that more people were to some extent negatively affected by the poor delivery of our team, but they were just too nice, avoiding a conflict and simply rather accepted the status quo.
The significance of this experience is twofold. First, getting out of our comfort zone every once in a while allows us to take different perspectives on projects, work, and relationships. This often leads to revisits of our priorities and results in positive outcomes all across the board, be it the work itself, the environment, or the relationships.
The second aspect of the experiment is learning to approach other human beings with respect, specifically those, who are the least popular ones. They, too, might have a message worth telling and an opinion worth listening to.
I have Linux at heart and I love the Unix principle of every tool doing one job and doing it well. I've been a long-time KDE and openSUSE user with NixOS on my radar for a while now, but it hasn’t become my desktop OS yet. Go is my choice for any backend development or in the form of the infrastructure automation glue. I often find myself writing shell inner-loops in one-liner fashion though :)
I have tested ytt as a yaml templating and overlay solution. I like it a lot, however, I don’t want to use it in projects that have the potential to involve a larger contributor base, as the adoption of ytt is low and the tool has a steep learning curve, partially due to unusual design decisions. In production, I typically resort to kustomize.
When it comes to new projects, I like to host them on GCP, simply love their UX and service quality. That’s why we at Berops chose to become GCP’s delivery and implementation partner.
I wouldn’t label myself as a developer per se, most of the development and scripting I do is in neovim. Fish is my interactive shell, asdf makes for easy binary management and version control, and kubie for kubeconfig management.
Running a company and building new products is difficult. I’m learning from others in this respect, for example, Eric Ries (via his book Lean Startup), Eric Schmidt (Work Rules), or Christopher Voss (Never Split the Difference).
To touch on the technical side of your question, I still find the yaml management as an unsolved problem, and therefore I’d like to spend more time on learning kpt soon.
And I’m learning to play the alto saxophone :)
Think twice before you code, but don’t overcomplicate things.
This means, trying to conservatively foresee how your solution will be consumed and what are the likely directions of its further evolution. Account for them when designing the data model.
But it’s the simple and readable code that’s easy and fast to work with, and most often saves you from drowning in maintenance. The KISS (Keep It Simple Stupid) principle.
Gergely Orosz. He gives great insights into meta-engineering topics.
For the SRE and infrastructure automation engineers, I strongly recommend the SRE Book by Google engineers. It helps to focus on important aspects of the software in this domain.
You asked for one book, but allow me to mention a non-engineering one as well: Markus Chown – Universe Next Door. It won’t make you a better programmer. It’s about exploring ideas, which is a major part of every software engineer’s world. You will enjoy it.
Will AI ever reach superintelligence? Likely later, but not in the near future. When did Tesla deploy the first versions of the autopilot? And we’re just not there yet. What we are observing now with the blooming of AI tools for developers is a comparable process, in a more difficult domain of SW engineering. Nonetheless, already now we’re seeing interesting results from the work of AI engineers. I’d let Nick Bostrom answer that question in depth :)
Developers are living in exciting times, enabled by the connectivity of the Internet and the seemingly unlimited computing power of the cloud at our fingertips. I’m looking forward to what’s ahead of us.
Aymen, thank you for having me.
Join other developers and claim your FAUN account now!
Founder, FAUN
@eon01
Only registered users can post comments. Please, login or signup.