Kubernetes Security Blog | RAD Security

Automated Risk Triage for Kubernetes Vulnerability Management | KSOC

Written by Story Tweedie-Yates | Jul 11, 2023 11:11:20 AM

Intro

The engineering and security teams working to build, manage and secure Kubernetes environments are facing a silent crisis as inefficient cloud native security tools fail to address their practical challenges on a day-to-day basis. These teams are understaffed, responsible for different priorities when it comes to Kubernetes, and even the most advanced teams are overwhelmed by noise and inoperable tooling that divorce the Kubernetes orchestrator and lifecycle from other Kubernetes components like runtime, container vulnerabilities, the network and the cloud. With the introduction of KSOC’s Automated Risk Triage, both engineering and security teams can finally automate the process of prioritizing and highlighting real risk across real time, ephemeral Kubernetes environments, improving efficiency across the board. 

 

The crisis of inefficiency

Today, if you run your applications in Kubernetes, you certainly have no shortage of options for security. You can: 

You can do this with a point solution or from one company that does all of them. And from your perspective it’s all relevant, because it’s all a part of the dynamic environment directed by Kubernetes. Unfortunately, a painful reality awaits, making cloud native security so inefficient it’s impractical to implement. 

 

Reality #1: Kubernetes shops are short-staffed now, more than ever

The teams involved with managing Kubernetes take shape with a mix of security and application engineering teams, all of which are short-staffed. Their combined responsibilities include:

  1. Migrate to Kubernetes
  2. Build everything on Kubernetes as fast as possible . . . and do it securely 
  3. Deal with security incidents  

These teams have the least room to spare when it comes to the skill-sets required for these tasks. A recent survey found that practitioners cited lack of skills and in-house expertise as the main thing holding back migration to Kubernetes. 

What we often see is teams that rely heavily on ‘that Kubernetes person;’ or somebody with the skill-set and experience in both developing and securing Kubernetes. Unfortunately, ‘that Kubernetes person’ is pulled in many directions to help other teams, creating constant demands on their time.

 

Reality #2: Kubernetes is caught between teams' different goals

Different teams across security and engineering care about different parts of the application development lifecycle (see diagram below). 

The focus on Kubernetes has been mostly on the engineering side of the house, while Security teams focus more on Appsec (shift left) or cloud security (shift right). This makes for a hugely inefficient dynamic, because generally, Kubernetes can’t be secured properly without engineering’s help. But engineering doesn’t always care about security for Kubernetes. 

This dynamic creates delays and lots of time going back and forth trying to understand shared priorities; engineering wants to rework as little code as possible while security wants to reduce the number of security issues they see.

 

Reality #3: Noise + inoperable security findings

The third major issue for Kubernetes Security that is not addressed today, even by those tools that touch all the different Kubernetes components, is inoperable noise created by a false separation between the components of Kubernetes, the Kubernetes lifecycle and the orchestrator itself (e.g. through the API). 

A tool might scan for a Kubernetes misconfiguration but can’t tell you whether the workload with that configuration is still running (see Example 1 below) or whether that same workload has a vulnerability in the container image or not (see Example 2 below). 

Since Kubernetes is the director of all the major components in a Kubernetes environment (network, cloud services, etc.), divorcing Kubernetes and its associated lifecycle from the security of its component parts removes critical context, creating noise that is next to impossible to operationalize.

Example 1: KSPM that uses polling intervals

KSPM that operates on polling intervals (not in real-time), ignores the reality of an environment that is constantly changing. This means that, when you go to fix a highlighted misconfiguration in the Kubernetes manifest (e.g. a container is running as root or with elevated privileges), the finding is no longer there because the workload is no longer running. And there is no historical context to understand what happened between the scanning intervals.

Example 2: Container risk without Kubernetes context

Let’s say that you have a case like log4j and need to root out where the vulnerability is located across all of your clusters at any given time. You can first try to upgrade and fix the issue upstream everywhere you use it.  But the development team is going to want to know what to prioritize first. That is where you need to know exactly where it is running across your clusters. 

You could get the view of the Kubernetes workload lifecycle, at the container level only. Today there are lots of ways to use runtime data to help prioritize vulnerabilities. 

But what about ‘that Kubernetes person’? How do they know which cluster it is in or the overall impact to the environment of taking down any affected clusters? If you have thin connections to the overall context of the Kubernetes environment, you’re going to be spending more time than needed understanding your real problems.

 

New Kubernetes Security Requirements

It is clear that any Kubernetes security tool truly solving the practical realities of teams today must: 

  1. Connect the component parts of Kubernetes to the Kubernetes orchestrator itself, as well as its component parts (runtime, public cloud, container vulnerabilities, the Kubernetes manifest, RBAC, etc.)
  2. Help short-staffed engineering and infrastructure security teams do more with less
  3. Reduce noise, with actionable findings that help both engineering and security

 

Triage your risk using threat vectors

Automated Risk Triage is based on threat vectors, and is the first solution in the market with the Kubernetes-first view of cloud native security required to address these efficiency issues. KSOC has wrangled the tangled web of relationships and interdependencies within Kubernetes to map together an automated, relational view of risk across multiple Kubernetes components into what we call threat vectors. Threat vectors update in real-time across the lifecycle so you always have an accurate view of risk. This is the hard work that needed to be done so that security and engineering teams working with Kubernetes could save precious time understanding and addressing their real issues.

The first, and most difficult, step toward building a solution that connects the component parts of Kubernetes to the Kubernetes orchestrator is mapping security risks across the Kubernetes lifecycle. KSOC released real-time Kubernetes Security Posture Management (KSPM) earlier this year to do this for Kubernetes manifest misconfigurations in real-time. That was only the first foundational step. 

Now, building on the real-time mapping of Kubernetes-specific components, we have created threat vectors, which add more Kubernetes components across misconfigurations, runtime, RBAC, the network, container vulnerabilities and the public cloud. On a real-time graph, threat vectors represent the first true definition of cluster risk across all component parts.

Automated Risk Triage, underpinned by threat vectors, make it possible for short-staffed teams to:

  • Save time by prioritizing and highlighting the real risk across real time, connected, ephemeral environments
  • Cut through the findings and alerts that are not pressing to give both engineering and security teams better insight on what to tackle next
  • Completely eliminate the busywork associated with vulnerability management and triage in Kubernetes, turning this into a project that can be done over your coffee break (see below example) 


Meanwhile, noise is drastically reduced for engineering and security:

  • Security can share an accurate, context-laden view to help engineering prioritize any rework required
  • Noise is literally cut in half (or more) and replaced with a connected view of risk 

 

The broad benefits of Automated Risk Triage

Vendors with capabilities that are specific to securing a Kubernetes environment can be grouped into three buckets:

  • K8s-specific point solutions (mostly KSPM & Admission control, some RBAC capabilities) 
  • Container-centric platforms with controls for Kubernetes components
  • CSPM-centric platforms with a view of Kubernetes components

With Automated Risk Triage, underpinned by threat vectors, you get key points of context that don’t exist in other solutions:

Let’s walk through two concrete examples to understand the day-to-day difference in your workload and capabilities with KSOC versus any alternatives.

 

Example #1: Admission controller

Below is a representation of the day-to-day difference between your admission control capabilities with and without KSOC’s threat vectors:

With KSOC, you are already getting four more points of connection & context, four more connections that otherwise would have cost you time and effort to make on your own, to help you decide the impact of setting this policy and whether you should spend time fixing the underlying issue earlier in the application development lifecycle.

 

Example #2: Container vulnerabilities

Below is the difference between understanding container vulnerabilities with and without KSOC:

The polling interval will lead teams on a wild goose chase to find the issue and whether it matters in the larger picture, because that depends on the real-time interplay between Kubernetes components.

Whether or not the container vulnerability will be susceptible to container escape into other clusters depends on whether it's present in a hardened Kubernetes cluster.T he Kubernetes context is effective in understanding your most critical risk.

 

Conclusion

The current state of Kubernetes security presents impossible challenges that teams cannot overcome without taking advantage of an approach that ties securing findings from Kubernetes components closer to Kubernetes and its lifecycle. Threat vectors allow you to automate risk triage through a connected view of Kubernetes, presenting a new kind of answer to the primary issue of efficiency across security and engineering teams struggling with a shortage of skilled staff, disconnections between various teams’ goals, and noisy, inoperable security findings. These teams can finally understand their real risk to the minute, even as it changes, and free themselves to work on other tasks. There is no doubt they could use the time back.

To start triaging risk with threat vectors today: