Introduction
Security for Kubernetes has historically been approached the same as scanning images in a repository or Cloud Security Posture Management (CSPM), without taking the dynamic Kubernetes lifecycle into account. Here we will explain why this approach has created an impossible situation for practitioners, and how real-time security for Kubernetes can finally make Kubernetes security actionable.
Teams are behind in securing Kubernetes
For companies taking advantage of cloud-native technologies to ship applications faster, Kubernetes commands a large share of attention. But when it comes to adoption of security, Kubernetes is still lagging behind.
We often see the following scenario:
The security and development teams are up and running with container security and the security team is adopting CSPM. When it comes to Kubernetes, there is a scanning tool (Kubernetes Security Posture Management (KSPM)) available. . .but we don’t use it. - Lead of the Cloud Security Team, Global 50 finance institution
If Kubernetes is such a large part of teams’ cloud-native environments, why aren’t they securing it?
The answer to this question starts with the Kubernetes lifecycle.
The Kubernetes lifecycle
Kubernetes is a dynamic orchestrator of container deployments, with containers constantly spinning up and down as engineers push deployments live and make continuous, iterative changes to applications. This process is dynamic and many use the term ‘ephemeral’ to describe it. Indeed, container workloads last on average less than 5 minutes . There is even evidence that orchestrated containers churn faster by at least 5 times that of non-orchestrated containers.
Except in the case of RBAC (which has its own lifecycle and is tied to the Kubernetes API), the Kubernetes lifecycle is the lifecycle of the Kubernetes workload, consisting of the phase of the Pod as well as the state of the containers inside. The pod phases refer, in general, to the scheduling of a deployment, the attachment to a node and whether the containers inside are running or not. The state of the containers themselves are simple; they are either waiting, running or terminated.
An average day for a busy Kubernetes environment can involve tens of thousands of pod and container deployments.
The Kubernetes lifecycle determines what is relevant in security
To understand how the lifecycle pertains to Kubernetes security, we have to break down how the lifecycle impacts the main elements of security in Kubernetes: workload manifest configurations, role-based access control (RBAC) and admission control.
Manifest configurations (such as an open Kubernetes API) are tied to the workload, so they are only exploitable at the time the workload is running. For example, ‘container running as root’ would only be applicable, or something that needs to be fixed, when the applicable container is running. Similarly, if a workload is deleted, the accompanying misconfiguration is no longer useful or valid.
RBAC is a bit different. It is not directly tied to a workload per se, it is a control at the API layer. While it can end up allowing such actions as ‘delete pod in ‘x’ namespace,’ generally it controls sensitive API actions such as `list secrets` or `impersonate` or `describe configmaps`. But RBAC can have its own lifecycle, in a sense, that moves very fast and is not tied directly to a running workload. For example, an attacker could come in, create a RoleBinding or ClusterRoleBinding, accomplish the exploit, and then delete the binding. So real-time visibility matters here as well, as applied to a different type of activities.
With admission control you can set a policy to stop a workload from being deployed in the first place. As a result, you might think that the lifecycle matters less in operating admission controllers. However, not many practitioners would never turn on admission control without first understanding the impact of blocking certain workloads. To understand this impact, you have to see what it would look like, in real-time, if you were to turn those policies into enforce mode. This is only possible in real-time, so real-time visibility is actually crucial to the implementation and usage of admission controllers.
The Kubernetes lifecycle is the determining factor in where and when you have exploitable misconfigurations, which can come and go in a matter of seconds, and it gives you a clue as to what you should be blocking with admission control. RBAC has its own lifecycle that is not necessarily tied to the lifecycle of the workload.
Point in time Kubernetes scanning is impossible to operationalize
Now we have enough background to start answering the question, ‘Why are teams behind in securing Kubernetes?’ The answer lies in the fact that today’s Kubernetes Security Posture Management (KSPM) scanners don’t actually work in real-time. Instead, they take an approach that is more appropriate to Cloud Security Posture Management (CSPM) or image scanning in the CI/CD pipeline, using polling intervals to create a snapshot of misconfigurations at one point in time.
We learned above that the key remediation tools under your control (manifest misconfigurations, RBAC and admission controllers) are only relevant when viewed in the context of an ephemeral workload that may or may not live for more than 5 minutes in any given time. Taking that into account, this means that the KSPM tools teams have been relying on are creating alerts and findings that are inactionable, by definition.
A cloud security engineer recently described the situation in his own words:
“We are in the 24 hour scan issue for the clusters, because containers spin up and down so fast, it’s hard to create findings for teams to remediate.”
Imagine seeing a misconfiguration pop up and then try to go and find the relevant cluster and workload. . .and it’s gone. That is the situation for cloud security teams who have a KSPM tool today. Going back to our original question, we actually need to refine it now to reflect the reality that teams want to secure Kubernetes. We should instead be asking, ‘Why aren’t teams using the Kubernetes security tools they have?’ The answer is that they are inactionable because the results do not reflect the actual real-time Kubernetes environment.
And, believe it or not, the worst issue for teams is not necessarily that they are inactionable. The worst issue is that, in the case of a breach, you would be completely blind to whatever is happening during the point-in-time scanning interval, including any historical records of anything that happened between scans. If information was leaked from your K8s clusters, you wouldn’t know how it happened and you wouldn’t have the data to figure out how it happened apart from the cluster the data was stored in.
Real-time Kubernetes security in action
Let’s walk through a very simple, hypothetical scenario to see real-time security in action.
Through a web vulnerability, an attacker has set up a reverse shell listener in somebody’s pod and is now root, as you can see below.
The attacker has gone ahead and extracted secrets and data, which takes a minute or so. Now the attacker deletes the pod and gets rid of the root access. You can see the pod is now deleted:
And the root access is also gone:
An interval scanner would never see or surface this information. It’s like it never happened.
Within KSOC, in real-time, you can see this issue that came up but was then fixed:
When you go to look at the event in more detail, you see the timestamp and how quickly this happened, which was just under two minutes (though this could just as easily have happened in two seconds).
Here is where the container established a reverse shell:
If you were to go into your manifest right now, after this attack is over, you wouldn’t see the information in the manifest anymore because it no longer exists. In KSOC, you can see the manifest details for the attack:
To be able to figure out what happened in your environment, you need this data. And to be able to understand which policies you should be setting, you need this information. From a policy and admissions control perspective, in KSOC, we give you an idea of what you should block with the ‘would block’ setting.
And we say why we would have blocked it; in this case the policies triggered were indicating the establishment of a reverse shell, a container running as root and an image that was not from an allowed registry list:
If you only had an interval scanner, you wouldn’t have seen this at all.
Common misconceptions: container security, GitOps, admission control & runtime
There are many misconceptions when it comes to Kubernetes security. Let’s go through each misconception in turn to understand why there is no substitute for Kubernetes in real-time.
- Couldn’t we stop a malicious image/container before deployment?
Scanning and blocking in the CI/CD pipeline, or earlier, is an important part of your overall cloud native security practices. It’s also very time-consuming and can be hard to figure out which repositories and images are critical to prioritize in that process. When you tackle security at the level of Kubernetes, you know you are picking an area where it’s clear what will actually be deployed and what could be an actual risk. And no container can be completely clean of vulnerabilities.
GitOps is another important process prior to deployment but there is always the case where developers (as well as attackers) will certainly be making changes outside of the GitOps process in Kubernetes. GitOps takes time to implement and in that time many attackers could have come in and out without check.
2. Couldn’t we just use an admission controller to block issues as they come up?
An admission controller can block non-compliant workloads from being deployed in the first place. But what team gets their admission controller policies all in perfect shape ahead of time and then enforces them in one fell swoop? What about those times when it’s not exactly clear whether you should enforce admission control or not? Not every policy or misconfiguration should be set to a blanket ‘block’ mode; production would suffer. Simply blocking is not a substitute for real-time visibility in Kubernetes.
3. Wouldn’t you get real-time information from a runtime or CWPP tool also?
You can get real-time information in runtime after a successful attack has already taken place, and your information will be relegated to the container workload only. The Kubernetes context will have to be reverse-engineered. This means you won’t be able to prevent anything in the first place, and you won’t be doing anything to contain the blast radius of the attack, once the attacker escapes your container and moves laterally across the cluster.
Attackers always need to escalate privileges and move to achieve their targets, rarely ever do they stop at the initial exploitation of the original vulnerability. Only the most advanced teams are actually blocking anything with their runtime tools, it’s safer and easier to take the angle of Kubernetes when containing the blast radius of any vulnerable workloads.
In the end, these tools and attack vectors all need to be addressed to achieve a solid security posture. They are all complementary, but the point is that you cannot replace real-time Kubernetes protection with any of the options above.
Actionable Kubernetes security. . .finally
Thus far we have seen that securing Kubernetes in real-time means you can remediate with actionable data and understand your Kubernetes risk accurately. To our customers, what real-time protection for Kubernetes means is that they can finally operationalize Kubernetes security and reduce the risk of giant attack vector that is Kubernetes. In doing so, they can also finally support the enormous business initiatives being built on Kubernetes in their businesses today.