Compliance with PCI is no small task. It is helpful to step back and remember the goal of PCI: protecting the confidentiality and integrity of highly sensitive customer financial information. That high level goal motivates a few critical security priorities: limiting authorized access as narrowly as possible (so that the fewest people possible have contact with that customer data), auditing access when it does happen (so that there is a record of who has touched customer data), and reducing the attack surface as much as we can in order to prevent unauthorized and unaudited access. We’ll discuss here a few key Kubernetes configurations aimed at achieving each of these priorities. Keep in mind that his article is a starting point, not a comprehensive guide on how to achieve PCI compliance.
We’re inspired here by Rory McCune’s excellent series on PCI with Kubernetes, check out his work for even more suggestions and implementation details.
Limit Authorized Access as Narrowly as Possible
One way of thinking about PCI is as an extension of the Principle of Least Privilege to the greatest degree possible. For Kubernetes, this might take the form of explicitly limiting access to the system:masters group to “break glass” accounts. But it might also look like disabling the default to automatically create service account tokens for all workloads and only generating such tokens for workloads that have a need to communicate directly with the API. This implements both the principle of least privilege (services that don’t need the access don’t have it) and a reduction in attack surface by eliminating unnecessary credentials that might be discoverable by a would-be attacker.
We can think about some other PCI implementations as logical extensions of the Principle of Least Privilege. For example, workloads that do not need to run as root should not be run with elevated permissions on the host. Namespaces should have a default deny-all ingress and egress policy, limiting cross-namespace traffic to only that which is explicitly allowed. Your entire control plane should be inaccessible from the open internet, creating a barrier or moat against those without explicit permission to access it. Each of these follows the spirit of minimizing permissions or privileges even if they are not directly tied to “normal” user permissions management processes like role-based access control.
Audit All Access
In addition to the Principle of Least Privilege, another security value crucial to PCI is non-repudiation. There needs to be a definitive audit trail of who did what (or who accessed what) in a system. Recall the “why” for PCI: if people have entrusted us with their sensitive financial information, and something goes awry, investigators should have a clear trail of evidence to follow.
Three features of out-of-the-box Kubernetes work against this value and need to be handled with care. First, Kubernetes does not enable auditing by default, so you’ll need to turn it on and provide a means for aggregating audit logs. Second, be aware of “default accounts” like the kube-admin account established when a cluster is first created. Usage of these accounts is not attributable to a single, specific user and therefore can’t be precisely audited in the way PCI requires. Access to them should be very narrowly limited and they should only be used in break-glass situations in which no other way of controlling the cluster exists. A third challenge is anonymous authentication to API endpoints. This sometimes serves a purpose— ie, for automated health checks or analytics— but it should be very carefully scoped to the endpoints required for those purposes. This will likely require disabling a number of default-open endpoints that are likely not needed by your organization.
In addition to providing a trail of evidence, auditing allows you to update your application to the Principle of Least Privilege when it become clear a user has more permissions than they need.
Taking that one step further, we should also be able to revoke a user’s access if auditing reveals it to be unnecessary or to have been abused. This means that the authentication methods available to users need to use revocable credentials. Kubernetes includes two methods of authentication that make revocation very challenging: client-cert authentication and API token authentication. For client-certs, revocation involves re-rolling the PKI of the entire cluster. This method of authentication is used by the kubelets on nodes for authentication to the API server, but it should not be used for services or human users. API token based authentication is used for service accounts. Revocation involves deleting and re-creating the associated service account. You should (a) only generate service account tokens for workloads that need them and (b) once again, not use this method of authentication for human users of your cluster. As a result of these recommendations, it is probably the case that you will need some sort of external authentication mechanism for your human users/cluster admins.
Reduce the Attack Surface
It is also critical to harden the actual workloads by taking steps such as building from minimal base images, open only needed ports, applying necessary patches, and configuring workload monitoring and logging so that you have visibility into what’s happening across your cluster.
Foundational to workload security is that you need to know what workloads you are running, allowing you to make reasonable judgments about how to secure and configure them. As an example: if a critical vulnerability is discovered in a package, deciding how to respond depends on you knowing whether affected versions of that package are deployed in your cluster. If they are, you may need to roll out updates quickly to patch the vulnerability. If they are not, you may want to add a rule to your admissions control blocking affected versions from being deployed accidentally in the future. If the image used in your workloads is variable (ie, latest) and/or you aren’t sure you can trust the source you’re pulling those images from (ie, they are being pulled in from the open internet), deciding which response to take may be very tricky. Its therefore recommended that you take two steps to improve your assurance about what is running in your cluster:
- First, instead of using latest, pinning to container digests or hashes (a much more precise specification than version tags).
- Second, pulling from an internal, trusted repository instead of the public repos. KSOC can assist with this visibility by generating SBOMS for your workloads and continuously monitoring them for new vulnerability notifications.
The other large attack surface in your cluster is your worker nodes. Common considerations for node security include maintaining underlying node operating system patches (especially kernel and hypervisor patches), choosing a hypervisor appropriate to the kind of workloads you’ll be running, and making sure you have monitoring and logging enabled on your nodes. These are basic security practices that should apply in any security environment, but they are important in a PCI compliance context because they reduce the likelihood of unauthorized access to customer data through an exploited security vulnerability. The lower that likelihood, the more you can show you have applied the Principle of Least Privilege and audited all authorized access to customers and PCI reviewers.
How KSOC can help
KSOC can help with least privilege access for your Kubernetes clusters with the following:
- We provide admission control for reducing the attack surface in production, with transparent policies. Because admission control can cause breaking changes, we offer an option for an ‘admission control dry run’ to first see what an admission control change would look like.
- KSOC can understand what permissions each subject can access and we can compare a user’s RBAC defined capabilities with actual activity in a cluster.
- KSOC scanning is done in-cluster, which means you don’t have to connect us to your image registry, limiting excessive access to your secrets.
- You can organize how you apply KSOC across multiple tiers to ensure that your policies and remediation results are kept separate for your different internal groups
- The RBAC audit and also our scanning results for misconfigurations are both accurate to the minute, based on the real-time event stream from the Kubernetes API. This means your results are accurate in the context of the Kubernetes lifecycle, and you can be confident your results are relevant.
Conclusion
PCI is about ensuring the confidentiality of sensitive customer financial data. To maintain confidence in that confidentiality, PCI requires a robust implementation of the Principle of Least Privilege, a clear audit trail of all access to your systems and the data they hold, and a minimized attack surface to reduce the likelihood of unauthorized (and unaudited) access as much as possible. Kubernetes environments can take a big swing a PCI compliance by implementing the recommendations in this article. To see the KSOC features above in action or give them a try yourself, get in touch with our sales team for a demo.