Behavior-based threat detection and response has been around for some time, with the top three companies in this field valued at a combined $100 billion. This method identifies potential threats by monitoring and analyzing unusual activities from users, devices, and applications. However, cloud-native environments differ significantly from legacy cloud environments, with 90% of cloud-native security teams saying they experienced a security incident in their container or Kubernetes environment.
This blog walks through the key tenets of behavior-based threat detection and response in cloud-native environments.
A cloud-native environment is fundamentally different from other cloud environments. The primary indicators of a cloud-native environment include:
The cloud does not have to be the backdrop of a containerized environment; nor does Kubernetes have to be hosted in the cloud. Most containerized and Kubernetes environments are hosted in the cloud, but this isn’t a defining factor for whether an environment is ‘cloud-native’ or not.
Containers are portable and can run against any backend; in comparison, legacy, monolithic applications are tightly paired to the servers or VMs they run on and generally do not take a microservices approach. This means that any part of the application that might need to be rewritten could cause issues with other parts of the application, and making changes and deploying new features takes much more time. This also means that legacy workloads change much less often than containerized workloads because their operations are so closely tied together.
Understanding how to do behavior-based detection and response in a cloud-native environment comes back to how containers are developed, as well as how containers and Kubernetes operate.
Containers are developed as portable logic, which moves through the software development lifecycle (SDLC) from Git, to the CI/CD pipeline in an image, and deployed into production as a running container.
Because containers are generally developed as microservices, a containerized application is composed of multiple discrete parts that work together versus one long code of logic. As a result, cloud-native environments operate at a much faster rate of change.
Containers are spun up and down rapidly once their job is finished; for example, one of RAD’s customers has an average of 10,000+ deployments per day, and the average container lasts less than 5 minutes. Kubernetes is the orchestrator on top of these changes, operating and scheduling the entire symphony, so its configurations change rapidly as a necessary part of the Kubernetes lifecycle.
A smart attacker would:
Detection and response in cloud-native environments must address these two risks.
The leading threat detection and response solutions for legacy environments take advantage of the relatively long-lived time of a workload for their behavior-based analysis. A legacy application workload running on a VM can take advantage of machine-learning, signature-based behavior-based detection methods because:
The problems with applying this kind of behavior-based approach in a cloud-native environment are that:
Next, we will explain why current methods of threat detection in cloud-native environments are ineffective, followed by the appropriate behavior-based threat detection and response method for a cloud-native environment.
Today, a plethora of Cloud Workload Protection Platforms (CWPP) and runtime security solutions are available for containers and Kubernetes environments. These solutions take a signature-based approach to threat detection, where rules are written to describe ‘bad behavior,’ and layered onto detection sensors (usually powered by eBPF). The popular open source project Falco is one such example of a classic signature-based model.
From a threat detection perspective, the signature-based approach is wholly inadequate. Falco is arguably the most popular runtime protection tool in the cloud native security industry today, with 6.6 thousand stars and an enviable set of rules. When the alerts from the runtime agent match those rules, boom, you know an attacker is there. Or do you? Signature-based detection methods have fundamental flaws that limit their usefulness:
1. Too many false positives from legitimate workloads
One of the major problems with the signature-based approach is that any alert could be a false positive, warranting further investigation. The alerts could signify an attacker . . . or it could simply be insecure behavior occurring as part of a legitimate workload (like an agent that is running a container as root, as it needs to perform its job).
2. Rule-writing is never-ending
For this approach to be effective, you have to write hundreds, maybe thousands of rules. But you still won’t be able to predict every attack technique. Yet another option is rule-based, behavioral analysis. A research team will observe attacks and then codify those attacks in rules of behavior. In this case, you are limited to what the research team can observe.
3. Unable to catch sophisticated attacks
Signature-based methods are not suitable for catching more sophisticated attacks where legitimate commands were run by the wrong user, or when legitimate processes are utilized for malicious purposes.
4. Heavy by design
Rules libraries are sometimes kept in the cluster itself, which does not scale. And the memory and compute power required to make the comparison between the rules and the behavior under observation, at the cluster level, can be equally prohibitive. This is especially true for those wanting more advanced capabilities, and therefore requiring larger libraries.
5. Stateless; without context
By design, alerts coming from signature-based detections are stateless, in that they relate directly to a syscall, or very granular host event. They don’t take into account what is happening elsewhere in that workload on their own, or by design, which is a problem when it comes to prioritizing and deciding what requires further investigation.
Some runtime security solutions will use anomaly detection via the ‘black box’ approach. With the black box, thousands of inputs go in and - poof! - out come anomalies that represent attacks. The limitations of this model include:
Detection and response in a cloud native environment must be able to:
After deployment, in runtime, container workloads exhibit a set of behaviors in running certain processes, programs and files. RAD has released an open source, online cloud native workload fingerprint catalog, to invite community-based efforts to further hone the model of creating such fingerprints.
Using the workload fingerprints as a model for behavior in a cloud native environment, detection of an incident is possible by detecting any drift from that behavior baseline.
In this approach, behavior is classified as this specific fingerprint in runtime.
Workloads running at any given time can all be compared to a fingerprint of what that workload should look like, keeping a consistent view of the baseline and version over time. The majority of container workload behaviors don’t change much over time. As a result, despite version changes of open source software, and rapid deployments, it is possible to codify the behavior of a container into a fingerprint consistently, as well as update versions of that fingerprint over time.
In the case of workload fingerprints, AI models can quickly query large datasets to classify drift into various categories of known attacks (when possible), and can support prioritization efforts across drift spanning large environments. But, unlike in a more traditional, legacy application, AI and machine learning is currently not actually helpful in cloud-native environments as a baselining and detection method in and of itself because the models require training on large datasets.
Because the behavior-based workload fingerprints are described in code, they are transportable, and can be used to verify integrity at any point in the SDLC, to compare baselined, ‘normal’ behavior with current behavior to detect drift. It’s kind of like an SBOM, but for the actual runtime behavior of a container.
Example of Behavior-as-Code in RAD - YAML of a fingerprint exhibiting drift
Attackers are targeting identity and infrastructure to exploit cloud native workloads. The FiveEyes recently announced that the actors behind the SolarWinds attack were targeting cloud native infrastructure and identity (non-human, service accounts specifically) with updated Tactics, Techniques and Procedures. Other cloud native attacks show that attackers move fluidly across these to persist and accomplish their goals.
In order to see this fluid movement, behavior of the attackers must be observed in real-time, and the relationships between identities, infrastructure and the workloads must be clear.
Risky Identity with connections to runtime and infrastructure risks in RAD
With RAD, we use automated behavior-based fingerprinting to create a baseline profile of your unique environment. We don’t have to have a previously created baseline from an open source or related image; we can baseline the behavior in your own environment and manage versions over time to continually update and approve a master fingerprint.
Below is an example of a drift in sshd that indicates exploitation of the recent XZ Backdoor software supply chain attack.
By snapshotting a clean representation of normal behavior, RAD can compare new runtime activity against a fingerprint to detect abnormal behavior:
To respond with RAD, you can quarantine or label a workload, terminate a pod, right-size an identity, fix a Kubernetes misconfiguration, and send alerts to a workflow or alert management tool; for example, a vulnerability management tool or a SIEM.
RAD action response center for building workflows
Right-size a K8s RBAC service account with RAD
Send alerts to an external tool of choice
Deploying the agent required for the workload fingerprinting requires an eBPF agent; unlike inflexible and unstable legacy agents, RAD’s sensor requires minimal and more precise permissions.
Behavioral detection and response should look different in a cloud native environment because of the difference between containerized, microservices applications and legacy applications, regardless of whether they are hosted in the cloud or on-premises. Behavioral threat detection is critical to get ahead of novel attacks in cloud native environments.
Download the essential runtime threat detection checklist for a detailed list of requirements for evaluating cloud native, signature based detection tools today, or reach out to get started creating your unique behavioral fingerprints!