What is Behavior-Based Threat Detection and Response in a Cloud-Native Environment?

Written by Story Tweedie-Yates | May 5, 2024 12:34:45 AM

Introduction

Behavior-based threat detection and response has been around for some time, with the top three companies in this field valued at a combined $100 billion. This method identifies potential threats by monitoring and analyzing unusual activities from users, devices, and applications. However, cloud-native environments differ significantly from legacy cloud environments, with 90% of cloud-native security teams saying they experienced a security incident in their container or Kubernetes environment.

This blog walks through the key tenets of behavior-based threat detection and response in cloud-native environments.

What is a cloud-native environment?

A cloud-native environment is fundamentally different from other cloud environments. The primary indicators of a cloud-native environment include:

Microservices-based application architecture
Containers and Kubernetes

The cloud does not have to be the backdrop of a containerized environment; nor does Kubernetes have to be hosted in the cloud. Most containerized and Kubernetes environments are hosted in the cloud, but this isn’t a defining factor for whether an environment is ‘cloud-native’ or not.

Containers are portable and can run against any backend; in comparison, legacy, monolithic applications are tightly paired to the servers or VMs they run on and generally do not take a microservices approach. This means that any part of the application that might need to be rewritten could cause issues with other parts of the application, and making changes and deploying new features takes much more time. This also means that legacy workloads change much less often than containerized workloads because their operations are so closely tied together.

How to do behavior-based threat detection and response in cloud-native environments

Understanding how to do behavior-based detection and response in a cloud-native environment comes back to how containers are developed, as well as how containers and Kubernetes operate.

Containers are developed as portable logic, which moves through the software development lifecycle (SDLC) from Git, to the CI/CD pipeline in an image, and deployed into production as a running container.

Because containers are generally developed as microservices, a containerized application is composed of multiple discrete parts that work together versus one long code of logic. As a result, cloud-native environments operate at a much faster rate of change.

Containers are spun up and down rapidly once their job is finished; for example, one of RAD’s customers has an average of 10,000+ deployments per day, and the average container lasts less than 5 minutes. Kubernetes is the orchestrator on top of these changes, operating and scheduling the entire symphony, so its configurations change rapidly as a necessary part of the Kubernetes lifecycle.

A smart attacker would:

Try to take advantage of this speed to get in and out without being noticed
Move unnoticed in the software supply chain via one of the various stages of the SDLC

Detection and response in cloud-native environments must address these two risks.

What is behavior-based threat detection and response in legacy environments?

The leading threat detection and response solutions for legacy environments take advantage of the relatively long-lived time of a workload for their behavior-based analysis. A legacy application workload running on a VM can take advantage of machine-learning, signature-based behavior-based detection methods because:

Millions of malware examples exist to properly train a machine-learning model to detect new ones on its own
Long-lived workloads provide the time required to baseline an environment’s behavior and detect anomalies

The problems with applying this kind of behavior-based approach in a cloud-native environment are that:

The workloads don’t live for a long time
Today, there is no list of source of millions of cloud and cloud-native attacks to work from

Next, we will explain why current methods of threat detection in cloud-native environments are ineffective, followed by the appropriate behavior-based threat detection and response method for a cloud-native environment.

Limitations of traditional threat detection and response methods for cloud-native environments

Signature-based threat detection

Today, a plethora of Cloud Workload Protection Platforms (CWPP) and runtime security solutions are available for containers and Kubernetes environments. These solutions take a signature-based approach to threat detection, where rules are written to describe ‘bad behavior,’ and layered onto detection sensors (usually powered by eBPF). The popular open source project Falco is one such example of a classic signature-based model.

From a threat detection perspective, the signature-based approach is wholly inadequate. Falco is arguably the most popular runtime protection tool in the cloud native security industry today, with 6.6 thousand stars and an enviable set of rules. When the alerts from the runtime agent match those rules, boom, you know an attacker is there. Or do you? Signature-based detection methods have fundamental flaws that limit their usefulness:

1. Too many false positives from legitimate workloads

One of the major problems with the signature-based approach is that any alert could be a false positive, warranting further investigation. The alerts could signify an attacker . . . or it could simply be insecure behavior occurring as part of a legitimate workload (like an agent that is running a container as root, as it needs to perform its job).

2. Rule-writing is never-ending

For this approach to be effective, you have to write hundreds, maybe thousands of rules. But you still won’t be able to predict every attack technique. Yet another option is rule-based, behavioral analysis. A research team will observe attacks and then codify those attacks in rules of behavior. In this case, you are limited to what the research team can observe.

3. Unable to catch sophisticated attacks

Signature-based methods are not suitable for catching more sophisticated attacks where legitimate commands were run by the wrong user, or when legitimate processes are utilized for malicious purposes.

4. Heavy by design

Rules libraries are sometimes kept in the cluster itself, which does not scale. And the memory and compute power required to make the comparison between the rules and the behavior under observation, at the cluster level, can be equally prohibitive. This is especially true for those wanting more advanced capabilities, and therefore requiring larger libraries.

5. Stateless; without context

By design, alerts coming from signature-based detections are stateless, in that they relate directly to a syscall, or very granular host event. They don’t take into account what is happening elsewhere in that workload on their own, or by design, which is a problem when it comes to prioritizing and deciding what requires further investigation.

Black box anomaly detection

Some runtime security solutions will use anomaly detection via the ‘black box’ approach. With the black box, thousands of inputs go in and - poof! - out come anomalies that represent attacks. The limitations of this model include:

Little transparency into the ‘why’ of its classification of threats
The results are very noisy, and with no understanding of the underlying methods, it’s very hard to know how to filter out the noise in a practical way.
Results are stateless many times - meaning that they lack the context needed to truly prioritize the risk. And adding that context in is not easy.
Any machine learning model relies on a certain quantity of attacks to learn from; there is no such source of millions of cloud and cloud native attacks to analyze

Addressing limitations of traditional tools with new behavior-based threat detection and response tools

Detection and response in a cloud native environment must be able to:

Find a way to baseline behavior despite short workload lifecycles
Catch attackers that take advantage of quick-moving (and disappearing) workloads and deployment cycles
Catch attackers as they infiltrate the software supply chain
Prioritize responses across identity, infrastructure and workloads

Addressing #1: Using Workload Fingerprints for behavior-based detection

After deployment, in runtime, container workloads exhibit a set of behaviors in running certain processes, programs and files. RAD has released an open source, online cloud native workload fingerprint catalog, to invite community-based efforts to further hone the model of creating such fingerprints.

Using the workload fingerprints as a model for behavior in a cloud native environment, detection of an incident is possible by detecting any drift from that behavior baseline.

In this approach, behavior is classified as this specific fingerprint in runtime.

Addressing #2: Behavior-based detection with short workload lifecycles

Workloads running at any given time can all be compared to a fingerprint of what that workload should look like, keeping a consistent view of the baseline and version over time. The majority of container workload behaviors don’t change much over time. As a result, despite version changes of open source software, and rapid deployments, it is possible to codify the behavior of a container into a fingerprint consistently, as well as update versions of that fingerprint over time.

In the case of workload fingerprints, AI models can quickly query large datasets to classify drift into various categories of known attacks (when possible), and can support prioritization efforts across drift spanning large environments. But, unlike in a more traditional, legacy application, AI and machine learning is currently not actually helpful in cloud-native environments as a baselining and detection method in and of itself because the models require training on large datasets.

Addressing #3: Behavior-based detection in the software supply chain

Because the behavior-based workload fingerprints are described in code, they are transportable, and can be used to verify integrity at any point in the SDLC, to compare baselined, ‘normal’ behavior with current behavior to detect drift. It’s kind of like an SBOM, but for the actual runtime behavior of a container.

Example of Behavior-as-Code in RAD - YAML of a fingerprint exhibiting drift

Addressing #4: Prioritization across identity, infrastructure and workloads

Attackers are targeting identity and infrastructure to exploit cloud native workloads. The FiveEyes recently announced that the actors behind the SolarWinds attack were targeting cloud native infrastructure and identity (non-human, service accounts specifically) with updated Tactics, Techniques and Procedures. Other cloud native attacks show that attackers move fluidly across these to persist and accomplish their goals.

In order to see this fluid movement, behavior of the attackers must be observed in real-time, and the relationships between identities, infrastructure and the workloads must be clear.

Risky Identity with connections to runtime and infrastructure risks in RAD

How to use RAD security for behavior-based, cloud-native detection and response

Fingerprinting and detecting drift in your unique environment

With RAD, we use automated behavior-based fingerprinting to create a baseline profile of your unique environment. We don’t have to have a previously created baseline from an open source or related image; we can baseline the behavior in your own environment and manage versions over time to continually update and approve a master fingerprint.

Below is an example of a drift in sshd that indicates exploitation of the recent XZ Backdoor software supply chain attack.

By snapshotting a clean representation of normal behavior, RAD can compare new runtime activity against a fingerprint to detect abnormal behavior:

Is this process, program, file, or network activity expected based on the behavior that's been represented in the fingerprint?
Does the node appear at the expected location in the hierarchy?
Do the node's properties match the expected properties? Is this process executed by the expected user? Is the expected file activity opening the expected file?

Responding to an incident

To respond with RAD, you can quarantine or label a workload, terminate a pod, right-size an identity, fix a Kubernetes misconfiguration, and send alerts to a workflow or alert management tool; for example, a vulnerability management tool or a SIEM.

RAD action response center for building workflows

Right-size a K8s RBAC service account with RAD

Send alerts to an external tool of choice

Lightweight, flexible eBPF sensor

Deploying the agent required for the workload fingerprinting requires an eBPF agent; unlike inflexible and unstable legacy agents, RAD’s sensor requires minimal and more precise permissions.

Conclusion

Behavioral detection and response should look different in a cloud native environment because of the difference between containerized, microservices applications and legacy applications, regardless of whether they are hosted in the cloud or on-premises. Behavioral threat detection is critical to get ahead of novel attacks in cloud native environments.

Download the essential runtime threat detection checklist for a detailed list of requirements for evaluating cloud native, signature based detection tools today, or reach out to get started creating your unique behavioral fingerprints!

View full post