Take Zero Trust to the Next Level with Confidential Virtual Machines

By Fabian Kammel

Take Zero Trust to the Next Level with Confidential Virtual Machines

SPIFFE and confidential computing are both security projects that minimize the level of implicit trust that a user needs to place into a computing system. We will show how to combine these approaches to minimize the trust we need to place in public cloud services, by extending the SPIRE node attestation process for AWS, to assert that we are running:

  • On an AWS EC2 machine
  • In a memory encrypted context

Let’s learn how we can achieve this!

What is Confidential Computing?

Confidential computing is a new set of hardware capabilities that enables users to run workloads in memory encrypted environments. Industry-established strategies exist to protect data in transit, such as using transport layer encryption (TLS), and data at rest, for instance using LUKS disk encryption. This memory encrypted environment provides you with confidentiality for data in use. Neither other virtual machines, the host machine, nor hypervisor itself can read any data you are processing.

Users can then remotely verify the integrity of those environments using hardware rooted attestation. The attestation report includes information about the actual hardware we are running on, and the software stack that makes up our virtual machine. The report is signed by the CPU itself and verified with a vendor-provided X.509 certificate chain. Users are therefore able to verify the integrity of a machine before placing trust in it.

If you’d like to learn more about this topic, the confidential computing consortium, a Linux foundation project, provides a technical analysis of this technology.

In this example we will focus on AMD SEV-SNP capabilities, introduced with AMD EPYC 7003 ‘Milan’. They have been available since April 2023 in AWS.

What are SPIFFE/SPIRE?

Secure Production Identity Framework for Everyone (SPIFFE), on the other hand, is a set of open-source standards for securely identifying software systems in dynamic and heterogeneous environments. Solving the hard problem of providing an identity to each system, enabled others to implement valuable features on top, for example the X.509 based identities are utilized to create mutually authenticated and encrypted channels between all workloads. Users are then able to define the minimal amount of permissions required for each workload to talk to another when carrying out its job. This is the foundation of zero trust architectures.

The open source implementation of the SPIFFE standard is SPIRE, and is based on a server-agent architecture. The agent requests an identity for the whole node from the server. The agent is subsequently responsible for requesting identities for workloads running on the same node.

flowchart LR
    subgraph SPIRE
        Server
    end

    subgraph Node
        Agent
        Workload
    end

    Server <--> Agent
    Agent <--> Workload

When a node wants to receive an identity it needs to authenticate itself, in a process called node attestation. This is a three-step process, where:

  1. The agent gathers data about itself.
  2. The SPIRE server verifies this information,
  3. The SPIRE server issues the SPIFFE verifiable identity document (SVID).

You can compare this process to requesting a government-issued identity card. Information about your person such as name, height and eye color are noted, verified and recorded in the final identity document. The degree of trust we can place in the document is determined by the soundness of the data gathering and verification process.

SPIRE on AWS

Let’s have a look how this process works in AWS, today:

sequenceDiagram
    participant Agent
    participant Server
    Note over Agent: Fetch IID
    Agent ->> Server: Send IID
    Note over Server: Verify IID
    Note over Server: Issue SVID
    Server ->> Agent: Send SVID

The agent requests an instance identity document (IID), which is issued and signed by Amazon’s instance metadata service (IMDSv2). The server verifies this document and, if verification succeeds, issues the SVID. This document is proof that our agent is a virtual machine running in AWS. It also contains additional information such as the cloud region and AWS account.

Adding Confidential Capabilities

Our goal is now to additionally proof that we are running on confidential computing hardware in a memory encrypted virtual machine. As discussed earlier we can use AMD SEV-SNP to acquire additional platform information directly via the supported hardware. This information is signed with a key rooted inside the CPU itself, and does not require us to place any trust in the cloud provider.

Let’s see how we can extend the issuing process:

sequenceDiagram
    participant Agent
    participant Server
    Note over Agent: Fetch IID
    Agent ->> Server: Send IID
    Note over Server: Verify IID
    Server ->> Agent: Send Nonce
    Note over Agent: Generate report
    Agent ->> Server: Send Report
    Note over Server: Verify Report
    Note over Server: Create SVID
    Server ->> Agent: Send SVID

After passing the initial verification of the IID, the server sends a Nonce, a random number to protect against replay attacks, and includes it in the attestation report, issued directly by the AMD CPU. This report can be verified by the server using standard X.509 certificate mechanisms and the certificate chain for Milan processors.

By incorporating the information in the attestation report, directly obtained from the hardware, into the SVID, we can enhance the confidence placed in the resulting identity document.

We can now assert two statements are true, our agent runs:

  • On an AWS EC2 machine
  • In a memory encrypted context

Conclusion

At the time of writing, this seems to be a novel idea. During my research I only found one mention of SPIFFE and Confidential Computing in the SPIRE RFC on Confidential Workloads, but the focus was on Intel SGX, rather than confidential virtual machines.

The actual implementation described in this article is available as fork of SPIRE. It is implemented as an extension of the SPIRE included aws_iid plugin. If you are curious and want to take this implementation for a spin, simply:

  1. Build the fork using make
  2. Configure the aws_iid plugin as described in the official documentation.
  3. Run the server: spire-server run -config server.conf
  4. Run the agent: spire-agent run -config agent.conf
  5. List node selectors: spire-server agent show -spiffeID spiffe://control-plane.io/spire/agent/aws_iid/1234567890/eu-west-1/i-01234567890

You will notice that a new selector (aws_iid:measurements) is available, which captures the measurement of the AMD SEV-SNP attestation report.

Just as minor improvements were already contributed back to the community (1, 2, 3), we are also excited to start a conversation with the SPIFFE/SPIRE community and make use of the results presented here.

We build and secure zero trust platforms

Learn More