Hands-on lab: Full Kubernetes compromise, what will your SOC do about it? (Part 1)

Part 1: Building Kuby Infrastructure

SOC Inspiration
7 min readJun 24, 2024

In this part, we’ll build Kuby infrastructure. Let’s bear in mind that it should not be taken as a tutorial. There are a lot of bad practices, sadly seen in real production environments: ClusterRoles instead of Roles, no network segregation at any layer, tokens with too much rights, etc.

  1. What we’re building
  2. Building IaaS resources with Terraform
  3. Building the Docker image of the Flasky application
  4. Building PaaS resources with kubectl
  5. Setting up the CI/CD pipeline in GitHub

What we’re building

Kuby infrastructure schema

As you can see, everything will be deployed in public subnets for the sake of the lab. It will be easier to troubleshoot things happening inside the worker nodes.

The control plane of our Kubernetes cluster is managed by Amazon EKS. The cluster has three namespaces being used:

  • default: It is used by the Flasky application
  • treasure: It used by the Treasure database
  • runners: It used by our GitHub runners

The CI/CD pipeline is rather simple in this lab. There is only one build & deploy job that creates a container image from our repository and push it to our ECR registry. It then does a rolling update of the image of the flasky-app Pods with this newly built image.

Requirements

  • An AWS account
  • A Terraform IAM user and role
  • A dedicated GitHub account
  • A basic understanding of Terraform, Kubernetes and Cloud concepts

Building IaaS resources with Terraform

Let’s first define the variables of our lab.

  • The instance_* variables define configurations for the EC2 worker nodes: instance type and the SSH keypair you want to use
  • The admin_arn is the ARN of the IAM user that will have the ClusterAdmin role in our Kubernetes cluster
  • The trusted_public_ipv4 is the IP allowed to contact your EKS public API server endpoint
variables.tf

There are multiple ways to configure the needed credentials for the AWS provider to work. For this lab, I’ll use aws-vault which allows me to store AWS credentials locally, but not in a plaintext file. During runtime, the credentials will be passed to any subcommand through environment variables, so you can leave the configuration empty.

0-provider.tf

Configure the VPC following AWS requirements to ensure smooth interaction with EKS. The VPC must have DNS hostnames enabled to facilitate node registration.

1-vpc.tf

Configure the networking inside your VPC. Two subnets in two AZ at least are required for EKS.

Network configuration of the Workers VPC

Now we’ll create several IAM resources.

  • The two first IAM roles are necessary for EKS
  • The third role will give our admin the ClusterAdmin role in Kubernetes, as well as all rights to ecr:* actions to manage ECR repositories and logs:* actions to retrieve available log groups
  • The fourth one will be assumed by our GitHub runners: they have full rights over the ECR service to manage repositories, and will be able to authenticate to the cluster through an access entry (next paragraph)
  • Finally, the last one is going to be used by CloudTrail to write logs to a dedicated S3 bucket and a specific log group
5-iam.tf

Now comes the EKS cluster itself.

  • We enable the three log types api, audit and authenticator in the control plane, the last one not being enabled by default
  • We only allow our trusted public IP to contact the EKS API server, but we also enable internal access from the VPC
  • We’ll only authenticate ourselves through EKS API, so with IAM roles
  • Our workers will be running an EKS-optimized AL2, on the instance type and with the SSH key configured in variables.tf
  • Finally, we put in place access entries binding the admin IAM role to the ClusterAdmin role, and the GitHub runner to the runners Kubernetes group
6-eks.tf

Lastly, we set up the ECR repository for the Flask application, as well as the logs that will be useful during incident response.

  • CloudTrail logs forwarded in a log group
  • EKS control plane logs forwarded in a log group
  • GuardDuty detector which will try to detect our scenario
Logging configuration, and ECR repository

Finally, define the output variables that will be of interest afterwards.

outputs.tf

Once you have every file inside the same location, simply apply the configuration. In the following command, I use my locally defined TF_ROLE profile which has the right policies to perform the changes.

# Let's say you have your $HOME/.aws/config file
# with the TF_ROLE profile defined

# Add credentials related to this AWS profile to aws-vault secret backend
$ aws-vault add TF_ROLE

# Execute commands with the TF_ROLE credentials as env variables
$ aws-vault exec TF_ROLE -- terraform init
$ aws-vault exec TF_ROLE -- terraform plan
$ aws-vault exec TF_ROLE -- terraform apply

Building the Docker image of the Flasky application

Using your dedicated GitHub account, you may fork the following GitHub repository: https://github.com/Kerberosse/flasky. Do not directly clone it, as you will run plenty of actions from that repository!

Once forked, we’ll simply clone your repository locally, build the Docker image, tag it correctly, and push it to your ECR repository.

The IAM entity that is allowed to interact with the ECR repository is the admin IAM role we created with Terraform. Let’s add its credentials with aws-vault and login to our registry with docker login.

# Let's say you have your $HOME/.aws/config file
# with the ADMIN_ROLE profile defined

# Add credentials related to this AWS profile to aws-vault secret backend
$ aws-vault add ADMIN_ROLE
$ PATH=$PATH:"$PWD/docker-credential-osxkeychain-XXX" aws-vault exec ADMIN_ROLE -- \
aws ecr get-login-password --region eu-west-3 | \
docker login --username AWS --password-stdin XXX.dkr.ecr.eu-west-3.amazonaws.com/kuby-flasky

Same idea behind using aws-vault, I don’t like my credentials being stored in plaintext. The documentation of the docker login command explains you how you can store them in a secure backend, such as osxkeychain under macOS.

Once authenticated, simply build and push the image.

$ git clone https://github.com/XXX/flasky.git
$ cd flasky
$ docker build -t XXX.dkr.ecr.eu-west-3.amazonaws.com/kuby-flasky .
$ docker push XXX.dkr.ecr.eu-west-3.amazonaws.com/kuby-flasky

Building PaaS resources with kubectl

Our underlying infrastructure being prepared, let’s now deploy Kubernetes assets in the cluster.

  • The Flasky application is deployed in the default namespace
  • Secrets used by the backend to connect to the database are defined
  • The database is a simple mysql container, initialized by a script defined in a ConfigMap and mounted under /docker-entrypoint-initdb.d
  • The flasky-app container uses the image that you pushed in the previous section and is exposed through an ELB

Side note: Flasky DevOps are here bad DevOps.

flasky.yaml

Now let’s build the Treasure assets. There is only a mysql database, but everything is namespace-d under treasure, and there’s even a NetworkPolicy that prevents traffic coming from the outside!

Side note: Treasure DevOps seem a bit better than the Flasky team…

treasure.yaml

Let’s finish by defining two roles: one for the Flasky application through the default/default ServiceAccount, and one for the GitHub runners through the runners group.

Cluster roles

We then apply everything through our previous ADMIN_ROLE, who is also ClusterAdmin of the Kubernetes cluster. As previously said, authentication is made through EKS API which allows us to retrieve the kubeconfig file.

$ aws-vault exec ADMIN_ROLE -- aws eks update-kubeconfig --name kuby

$ aws-vault exec ADMIN_ROLE -- kubectl apply -f flasky-role.yaml
$ aws-vault exec ADMIN_ROLE -- kubectl apply -f runner-role.yaml
$ aws-vault exec ADMIN_ROLE -- kubectl apply -f flasky.yaml
$ aws-vault exec ADMIN_ROLE -- kubectl apply -f treasure.yaml

Setting up the CI/CD pipeline in GitHub

In section 3 Building the Docker image of the Flasky application, you forked Flasky’s repository into your own -dedicated- account.

Start by defining the following repository Actions secrets, used in the GitHub workflow of Flasky.

Repository Actions secrets
  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_ROLE_TO_ASSUME can be found in Terraform output
$ aws-vault exec TF_ROLE -- terraform output -json
  • If you strictly followed the lab, AWS_REGION_ID is eu-west-3 and K8S_EKS_NAME is kuby

Create a Personal Access Token (PAT) by following this documentation. Ensure you set up the following scopes: repo and workflow.

PAT scopes

We’ll now deploy our GitHub runners inside our Kubernetes cluster within the runners namespace. To do this, we’ll use a Helm chart to install ARC (Actions Runner Controller). Helm is the package manager of Kubernetes and provides bundled Kubernetes resources. ARC will provide us all the resources to have a scalable set of GitHub runners ready to run!

Let’s first configure some variables, in a file called values.yaml. We want to authenticate to GitHub with our PAT stored inside a Secret called github-pat. We also want to enable DinD (Docker-in-Docker) to easily use Docker inside our runners.

values.yaml

If not already done, install Helm and then the Helm charts with the provided configuration.

$ curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

$ aws-vault exec ADMIN_ROLE -- helm install arc \
--namespace "runners" \
--create-namespace \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
$ aws-vault exec ADMIN_ROLE -- helm install "kuby-runner-set" \
-f values.yaml \
--namespace "runners" \
--create-namespace \
--set githubConfigUrl="https://github.com/XXX/flasky" \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Now create this github-pat Secret in namespace runners with the value being your PAT under the github_token field.

github-secret.yaml
$ aws-vault exec ADMIN_ROLE -- kubectl apply -f github-secret.yaml

That’s it for Part 1. Our lab is ready, and in the next part we’re going to attack it.

> Part 0: Introduction (previous)

> Part 2: Attacking Kuby infrastructure (next)

--

--

No responses yet