Case Study: ChaosDB
ChaosDB is an unprecedented critical vulnerability in the Azure cloud platform that allows for remote account takeover of Azure’s flagship database - Cosmos DB. The vulnerability, which was disclosed to Microsoft in August 2021 by Wiz Research Team, gives any Azure user full admin access (read, write, delete) to another customers Cosmos DB instances without authorization. The vulnerability is a prime example of the need for proper tenant isolation, as it impacted thousands of organizations, including numerous Fortune 500 companies.
How could the vulnerability be exploited?
Wiz
-
Deploy a Cosmos DB instance and with an embedded Jupyter Notebook container.
-
Execute any C# code to elevate privileges to host-level root.
-
Remove locally set firewall rules to gain unrestricted network access in the environment.
-
Authenticate to a management server using a self-signed certificate, as the server did not properly validate these, and retrieve encrypted access keys belonging to other tenants.
-
Query the server to obtain certificates for other services, including an administrative certificate that allowed decryption of other tenants' access keys.
What was the root cause?
By using the PEACH framework to model the state of Cosmos DB prior to disclosing this vulnerability, we can conduct a root cause analysis of this vulnerability – to the best of our understanding, each tenant’s embedded Jupyter notebook ran in a container nested within a virtual machine, which might seem to be a strong isolation scheme in and of itself. However, by measuring the hardening factors of this interface we can identify critical gaps at the implementation level:
Wiz
-
Privilege hardening gap – tenant-allocated VM had access to shared admin certificate.
-
Encryption hardening gap – tenant API keys encrypted with shared key.
-
Authentication hardening gap – self-signed certificate not validated.
-
Connectivity hardening gap – network controls only enforced within container (iptables); orchestrator interface accessible from tenant container.
-
Hygiene gap – tenant could access unrelated certificates and keys.