Kubernetes Security Blog | RAD Security

Why Identity and Access Management in Kubernetes are so Important to get Right

Written by Jimmy Mesta | Nov 19, 2024 8:34:48 PM

Identity is one of the cornerstones of cyber security. Without a concept of identity, it is impossible to make decisions about access, permissions, or privileges in a system. But identities, and the “rights” attached to them, require management. Doing that within the context of Kubernetes is one of the recurring themes we hear in our conversations with customers. This article pulls together several themes we’ve written about before into a high-level guide on identity and access management within Kubernetes: what is it, what are its challenges, and why does it matter?

There are four critical pieces to any identity and access management scheme:

  1. Authentication: how you know who a purported user is
  2. Authorization: how you decide what a user should be allowed to do
  3. Auditability: how you validate that a user is doing what they should be doing
  4. Non-Repudiation: how you connect observed actions to a particular user

We’re going to consider each of these in turn.

Authentication to a Kubernetes Cluster

What’s Easy:

Kubernetes comes out of the box with several methods for authenticating users, including using X.509 certificates and integrating with OIDC providers. Additionally, managed Kubernetes offerings from cloud services may integrate with existing cloud provider authentication systems. This means there are several options to choose from, and those options are, by design, meant to accommodate your existing authentication workflows.

What’s Hard:

Kubernetes doesn’t actually have a concept of a “human user” as an object within the cluster/API. That is because Kubernetes is expecting your authentication to be handled by an external system (see above), but it means that there isn’t a native “source of truth” about who the users in the cluster are. Instead, that job also falls to whatever external authentication system you are using. That means you need to control which users can access a cluster through that external system, not the cluster itself.

Furthermore, Kubernetes doesn’t have its own concept of a “session”: there is no built-in expiration time on user authentication. Instead, it relies on the authentication material provided by the external authentication system. For example, if that is an X.509 certificate, the user will be able to access the cluster as long as that certificate is valid. That makes a strong argument for using token-based authentication systems (such as OIDC) which have relatively short session lifetimes.

Why It Matters:

Authentication is the front-door to your cluster. It’s what enables users to get in. While our next section (authorization) provides a “defense in depth” factor, if your authentication system isn’t properly configured you are leaving yourself vulnerable right out the gate.

Authorization in Kubernetes

What’s Easy:

Kubernetes comes with a robust and very popular authorization systems known as “Role Based Access Control” or RBAC. RBAC enables cluster administrators to define roles— encompassing a set of permissions at either the cluster or namespace level— and assign them to particular users via role bindings (also scoped to either the cluster or the namespace). These roles explicitly define the actions a user can take in the cluster, ranging from read-only on a small set of resources to full cluster admin. Since this is a native Kubernetes concept, it’s relatively easy to define these roles (yaml for the win!) and implement a robust authorization system.

What’s Hard:

The hard part about using RBAC: defining what you want your roles to be. It’s very easy to either over-provision or under-provision roles. This isn’t a technical challenge, it’s a policy and people challenge. 

Why It Matters:

There are two critical security concepts that RBAC encompasses: the Principle of Least Privilege and the Separation of Duties. The first says that you should only grant users the permissions they need to complete their normally assigned tasks. It also says that users should generally use the least provisioned role possible to complete their tasks and only elevate their privileges if/when necessary. The principle of Separation of Duties argues that users should have clearly defined roles with limited scopes of responsibility so that you don’t need users to have broad and powerful permissions.

These two principals are critical because together they provide the foundation for defense in depth when it comes to users in your environment. Having clearly defined, well scoped roles and corresponding least-privilege permissions makes it very difficult for any one user’s account to become the turning point in an attack, whether from an insider or an external attacker. 

Auditability of Roles and Access in Kubernetes

What’s Easy:

There are two parts to auditing permissions: first, evaluating whether the defined permissions a user has are appropriate to their role; second, analyzing how a user has used their permissions to determine if they genuinely need the permissions they have.

Building on what we just said about RBAC: the first part of auditing user permissions is fairly straightforward in Kubernetes because those permissions are defined in the Configuration-as-Code language of the cluster. The second part is also easy to initially enable: Kubernetes has built-in mechanisms for logging that can be easily activated in the cluster. Those logs are what you will use to evaluate what a user has been doing and evaluate whether they truly need the permissions they have.

What’s Hard:

The hard part of auditing user permissions is very similar to the hard part of defining them in the first place: you need to make policy and people judgments about what permissions your users require. But there’s one more challenge to auditing Kubernetes user permissions, and has to do with getting those logs somewhere useful. While it’s easy enough to activate logging in Kubernetes, that just turns on streaming text. That format isn’t exactly going to be easy to use for performing a user activity audit. For that, you’re going to need to transport those logs into some sort of data analysis system that will allow you to search, filter, and make sense of what users have been up to.

Why It Matters:

We talked about the importance of the principle of least privilege. The reality is that Role Based Access Control never truly hits the mark: roles are meant to be generalized at various levels of abstraction such as team assignment or relative permission levels (read-only vs. admin, for example). Those levels of abstraction are not always granular enough, and they almost never fit like a glove for every single user within them. Auditing is how you keep checking whether your role definitions make sense or whether you need to create new ones that better achieve the Principle of Least Privilege.

Non-Repudiation of Activity in a Kubernetes Cluster

What’s Easy:

Auditing is, in part, about examining a user’s actions against their defined permissions and assessing whether they are a match. Non-repudiation looks at the same data and asks “how do I know that was that user?” In most identity schemas, this depends on the strength of your authentication system. That helps, but for reasons we’ll see in a moment it’s not as definitive in Kubernetes as it might be in other contexts. As a result, not very much about this is easy in Kubernetes. About the only thing we can say here is what we said above about activating logs: turning them on is fairly simple. Then comes the hard stuff.

What’s Hard:

As we said above, you’re going to need to transport your logs somewhere useful, which isn’t the easiest of tasks. Beyond that, however, you need to overcome two challenges. The first is tying the identity of an actor in the cluster to an external identity. Since Kubernetes relies exclusively on external identity providers, and trusts whatever materials those providers give it, this will depend entirely on how robust your external authentication system is. External authentication tied to an OIDC provider with MFA and Zero Trust protections, for example, will give you a pretty good idea that the user claiming to be “Bob” in your cluster is in fact Bob. Authentication tied to a long-lasting certificate that might not be stored in the most secure of locations will, conversely, not give you a lot of confidence in those identity claims. And if your cluster uses any sort of “shared” identities, non-repudiation might just go out the window.

A second major challenge is that while Kubernetes doesn’t have a concept of a “human user” as an API object, it does have service accounts, which can (a) have the same kinds of RBAC permissions as human users and (b) default to having certificates as their “proof” of identity. What that means is that an actor who manages to scoop the identity of one of the service accounts in your cluster will inherit all of its permissions and you may have a very difficult time differentiating their activity from the legitimate activity of the service account. Solving for this requires locking down the permissions and authentication material for your service accounts.

Why It Matters:

Non-repudiation becomes very important when investigating a (potential) security incident. Knowing who did what in your cluster becomes critical to establishing a timeline and determining the proper courses of action to protect the cluster from further attack. That may look different, for example, if the threat is a malicious insider than if it is a stolen account. In very serious incidents, especially involving insiders, this also becomes important for potential legal actions that may follow the initial incident response.

Conclusion: How RAD Security Can Help

One of the core components of RAD’s offering is RBAC analysis. To do this, we solve the problem of how to usefully transport and analyze your cluster logs and we automate the process of comparing existing role permissions with actual usage and activity. You still need to make judgment decisions about which permissions users should have, but RAD can enable you to make those decisions in a data driven way.

Additionally, RAD’s misconfiguration detections and behavioral alerts can help you identify potential weaknesses in your identity architecture— such as un-protected service accounts— and spot anomalous behavior that might indicate an abuse or attack.