Hands-on lab: Full Kubernetes compromise, what will your SOC do about it? (Part 3)

Part 3: Responding to Kuby incident

10 min readJun 24, 2024

In the previous parts, we have built and attacked Kuby. Now we’ll try to see what we can see from this attack as defenders, and how things could have been improved.

As a reminder, we have enabled three sources that produce logs. We’ll only use them without trying to perform any disk or RAM forensics which could have been useful to fill some of the gaps we will have.

EKS control plane logging logging Kubernetes-focused events
Amazon CloudTrail logging AWS-focused events
Amazon GuardDuty generating built-in alerts from EKS

Summary of the attack scenario
Reviewing GuardDuty alerts
IR can start: Collection and preservation of evidence
What we can find from the kill chain
What we didn’t find from the kill chain

Summary of the attack scenario

For those who only focus on the defensive part, let’s summarize our attack in several steps.

After getting our initial access through a backdoored Flask application, we retrieved all Kubernetes Secrets through an over-privileged ServiceAccount. This allowed us to retrieve the credentials for both databases, but we could only access one of them due to a NetworkPolicy preventing us from getting inside the treasure namespace. We also accessed a GitHub token used by the runners to authenticate to GitHub.

Using this token, we’ve seen that AWS credentials were stored in the repository’s secrets, certainly to run privileged actions inside the Kubernetes cluster. We’ve exfiltrated them using a malicious GitHub workflow.

Finally, we’ve seen that the permissions associated with the runners were enough to execute code inside any running pod. We exfiltrated the second database simply by executing mysql commands with kubectl exec.

Reviewing GuardDuty alerts

This section will be quick, as GuardDuty didn’t detect anything from our main scenario. More specifically, it only detected this false positive related to the legitimate CI/CD pipeline where we use DinD to use Docker from inside our runner in the legitimate workflow. However, DinD is not the recommended way to build an image from a container. Solutions like kaniko exist as alternatives, and you should be able to challenge the DevOps team if you see such practice.

GuardDuty alert

IR can start: Collection and preservation of evidence

Let’s assume we still enter in an incident response process thanks to another alerting mechanism, such as a dark web monitoring that detected the leak published on the XSS forum. This will be our starting point as defenders, and we’ll reason as if we didn’t know anything else.

kuby_leak_hahahahahaha.txt

By talking with Flasky and Treasure product teams, we quickly assume the whole cluster should be considered compromised. I’ll skip the containment phase of the incident response, as that would be a whole article by itself to find the right containment strategy, and actually other articles exist on the matter.

Our first action will be to preserve and collect evidences. Here we can start by retrieving all relevant log groups regarding this EKS cluster.

# EKS control plane logs
$ aws-vault exec ADMIN_ROLE -- aws logs tail "/aws/eks/kuby/cluster" --since 1d > eks_tail.log

# CloudTrail logs
$ aws-vault exec ADMIN_ROLE -- aws logs tail "CloudTrail" --since 1d > cloudtrail_tail.log

Those logs will give us a glimpse of what was done through AWS APIs and through EKS API.

However, those are not the only APIs that can induce modifications of our cluster. If the kubelet daemons running on the worker nodes do not enforce authentication, which is the default, our attacker could talk to them even unauthenticated and potentially make unlogged actions on those nodes. Let’s mention this tool which can be used to talk to kubelet.

Because of this, I’ll also run a script to snapshot valuable resources from the impacted cluster. To be honest I didn’t find any open-source tool for evidence collection in a Kubernetes cluster, so I quickly wrote one for this lab that is far from being complete (kubelet logs, etc.) but sufficient for this scenario.

collector.sh

Of course, you’d run it from aws-vault to pass authentication material to the script which makes use of kubectl.

$ aws-vault exec ADMIN_ROLE -- ./collector.sh pods events ...

What we can find from the kill chain

We know the Treasure database was exfiltrated, so there must have been some interaction with the pod made by the attacker. We also know with Kubernetes API that a lot of information is going through requestURIs, so it might be a good start to see what are the different actions made on that pod by looking at Events logs. Translated into a bash command, it gives something like this:

$ cat eks_tail.log | grep "treasure" | grep "pod" | grep "requestURI" | \
  cut -d " " -f 3- | jq -r '. | "\(.kind) \(.requestURI)"' | sort -u

Event /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9/exec?\
  command=mysql&command=-e&command=show+tables%3B&container=treasure-db&\
  stderr=true&stdout=true
Event /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9/exec?\
  command=mysql&command=-u&command=treasure_user&command=\
  -pTreasure9f0c03f23e52edb6da1700c3113d1532bdac4bb6815619826e858d66b21585b2&\
  command=-e&command=show+databases%3B&container=treasure-db&stderr=true&\
  stdout=true
Event /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9/exec?\
  command=mysql&command=-u&command=treasure_user&command=\
  -pTreasure9f0c03f23e52edb6da1700c3113d1532bdac4bb6815619826e858d66b21585b2&\
  command=-e&command=show+tables%3B&container=treasure-db&stderr=true&\
  stdout=true
Event /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9/exec?\
  command=mysql&command=-u&command=treasure_user&command=\
  -pTreasure9f0c03f23e52edb6da1700c3113d1532bdac4bb6815619826e858d66b21585b2&\
  command=-e&command=use+treasure%3B+select+%2A+from+users%3B&container=\
  treasure-db&stderr=true&stdout=true
Event /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9/exec?\
  command=mysql&command=-u&command=treasure_user&command=\
  -pTreasure9f0c03f23e52edb6da1700c3113d1532bdac4bb6815619826e858d66b21585b2&\
  command=-e&command=use+treasure%3B+show+tables%3B&container=treasure-db&\
  stderr=true&stdout=tru

Those are kubectl exec commands. We can quickly assume that’s how the database got leaked, since no one from the product team recalls doing those commands. We have found our attacker!

Note: if we wouldn’t have found any interaction with the pod, then it might have been good to check any interaction with the database storage, so the underlying PersistentVolume, which could also have been accessed directly from the host.

Let’s find the raw logs associated with those events to find some other information we can pivot on.

$ cat eks_tail.log| grep "command=mysql" | cut -d " " -f 3- | jq -r '. | \
  "\(.kind) \(.userAgent) \(.sourceIPs) \(.verb) \(.user.username) \(.user.groups)"' \
  | sort -u

Event \
kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 \
["10.0.1.82"] \
get \
arn:aws:sts::XXX:assumed-role/kuby-GHRole/botocore-session-1719067739 \
["runners","system:authenticated"]

The User-Agent suggests the usage of kubectl, but that might not be discriminating enough. Let’s save it for later.

Thanks to our snapshot of resources, we can find that the source IP address of those kubectl exec commands is another pod, flasky-app. Weird.

Name:             flasky-app-c4548ddf9-lw4rn
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-10-0-1-121.eu-west-3.compute.internal/10.0.1.121
Start Time:       Sat, 22 Jun 2024 16:15:01 +0200
Labels:           app=flasky-app
                  pod-template-hash=c4548ddf9
Annotations:      <none>
Status:           Running
IP:               10.0.1.82
IPs:
  IP:           10.0.1.82

Finally, the username suggests that the attacker was authenticated through EKS API, and the team says that’s the GitHub role the runners use. Weirder! How did they obtain the credentials of the GitHub IAM user? Why are they using it from the flasky-app pod?

You are told the runners are getting their authentication material directly from Flasky’s repository secrets. That’s where you make the link with the GitHub PAT that was stored in a Kubernetes Secret. Who read it recently?

$ cat eks_tail.log| grep "github-pat" | cut -d " " -f 3- | jq -r '. | "\(.kind) \(.userAgent) \(.sourceIPs) \(.verb) \(.user.username) \(.user.groups)"' | sort -u 
Event kubectl/v1.30.0 (darwin/arm64) kubernetes/7c48c2b ["X.X.X.X"] \
  get arn:aws:sts::XXX:assumed-role/kuby-AdminRole/1719046174609728000 \
  ["system:authenticated"]
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] \
  get system:serviceaccount:default:default \
  ["system:serviceaccounts","system:serviceaccounts:default","system:authenticated"]
Event manager/v0.0.0 (linux/amd64) kubernetes/$Format \
  ["10.0.1.7"] get system:serviceaccount:runners:arc-gha-rs-controller \
  ["system:serviceaccounts","system:serviceaccounts:runners","system:authenticated"]

The first one is our admin, from a legitimate public IP. The last one is the GitHub ARC controller, which is not surprising since it’s the ServiceAccount managing the interaction with GitHub. The one in the middle is weird: same User-Agent as the one we’ve seen for the kubectl exec commands, and coming from the default:default ServiceAccount.

The source IP is unknown in our snapshot though. Let’s find out if we have any event regarding this IP.

$ cat eks_tail.log | grep "10.0.1.191" | cut -d " " -f 3- | jq -r

The last event shows that it was a flasky-app pod that was updated. The IP simply changed.

Let’s find what other things were done with the default:default ServiceAccount.

$ cat eks_tail.log| grep "default:default" | cut -d " " -f 3- | jq -r '. | "\(.kind) \(.userAgent) \(.sourceIPs) \(.verb) \(.requestReceivedTimestamp) \(.requestURI) \(.user.username)"'                
Event curl/7.88.1 ["10.0.1.242"] get 2024-06-22T12:32:20.744960Z /api system:serviceaccount:default:default
Event curl/7.88.1 ["10.0.1.191"] get 2024-06-22T12:37:52.329542Z /api system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] create 2024-06-22T13:06:10.876406Z /apis/authorization.k8s.io/v1/selfsubjectrulesreviews system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:07:37.603265Z /api?timeout=32s system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:07:37.608119Z /apis?timeout=32s system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:07:37.635527Z /api/v1/namespaces?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:14:49.828351Z /api/v1/namespaces/default/secrets?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:15:07.705908Z /api/v1/namespaces/default/secrets/flasky-secret system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:15:21.684254Z /api/v1/namespaces/default/secrets/flasky-secret system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:15:34.724243Z /api/v1/namespaces/runners/secrets?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:15:49.000234Z /api/v1/namespaces/default/secrets/github-pat system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:15:56.784342Z /api/v1/namespaces/runners/secrets/github-pat system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:16:03.921655Z /api/v1/namespaces/treasure/secrets?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:16:10.523736Z /api/v1/namespaces/treasure/secrets?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:32:23.125449Z /api/v1/namespaces/runners/pods?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:32:37.817660Z /api/v1/namespaces/runners/pods/kuby-runner-set-7d446b46-listener system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:32:37.824813Z /api/v1/namespaces/runners/pods/kuby-runner-set-7d446b46-listener system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:32:37.839748Z /api/v1/namespaces/runners/events?fieldSelector=involvedObject.name%3Dkuby-runner-set-7d446b46-listener%2CinvolvedObject.namespace%3Drunners%2CinvolvedObject.uid%3D6c55b1de-4bbb-4132-80bd-6711e4d78c19&limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:33:46.836994Z /api/v1/namespaces/default/pods?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:33:56.867952Z /api/v1/namespaces/default/pods/flasky-db-56d5ffdb7b-9mbwg system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:33:56.874420Z /api/v1/namespaces/default/pods/flasky-db-56d5ffdb7b-9mbwg system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:33:56.884709Z /api/v1/namespaces/default/events?fieldSelector=involvedObject.name%3Dflasky-db-56d5ffdb7b-9mbwg%2CinvolvedObject.namespace%3Ddefault%2CinvolvedObject.uid%3Dddf0145d-8fc8-468b-ad2b-72d269db906f&limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:37:50.175097Z /api/v1/namespaces/treasure/pods?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] get 2024-06-22T13:37:50.185816Z /api/v1/namespaces/treasure/pods/treasure-db-5ff5d5c765-m8vx9 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.191"] list 2024-06-22T13:37:50.195363Z /api/v1/namespaces/treasure/events?fieldSelector=involvedObject.name%3Dtreasure-db-5ff5d5c765-m8vx9%2CinvolvedObject.namespace%3Dtreasure%2CinvolvedObject.uid%3Dc0c60744-cafe-4210-aa37-358a9a8140fa&limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] get 2024-06-22T13:48:09.797375Z /api?timeout=32s system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] get 2024-06-22T13:48:09.801349Z /apis?timeout=32s system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] list 2024-06-22T13:48:09.816944Z /api/v1/namespaces/runners/pods?limit=500 system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] get 2024-06-22T13:48:22.914821Z /api/v1/namespaces/runners/pods/kuby-runner-set-7d446b46-listener system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] get 2024-06-22T13:48:22.922157Z /api/v1/namespaces/runners/pods/kuby-runner-set-7d446b46-listener system:serviceaccount:default:default
Event kubectl/v1.30.2 (linux/amd64) kubernetes/3968350 ["10.0.1.242"] list 2024-06-22T13:48:22.930606Z /api/v1/namespaces/runners/events?fieldSelector=involvedObject.name%3Dkuby-runner-set-7d446b46-listener%2CinvolvedObject.namespace%3Drunners%2CinvolvedObject.uid%3D6c55b1de-4bbb-4132-80bd-6711e4d78c19&limit=500 system:serviceaccount:default:default

We still only see events from flasky-app pods, but both from kubectl and curl at the beginning. We can suppose the attacker first used curl to test its token (notice the simple /api which is not doing much), and then kubectl for more advanced actions. We can see all the secrets he got, i.e. all of them.

We can also see a request towards the endpoint /apis/authorization.k8s.io/v1/selfsubjectrulesreviews. This is actually what is requested when you issue the command:

$ kubectl auth can-i --list

That’s a very weird thing from a ServiceAccount, and is likely a sign of the attacker reviewing its rights.

At this point we can review other things that were done with this User-Agent which, in the end, gives a lot of information we already have.

Finally, let’s see what other things were done on other AWS services with the credentials of the GitHub runner IAM user. As usual, let’s first find pivots identifying our attacker among legitimate activities.

$ cat cloudtrail_tail.log | grep GHUser |  cut -d " " -f 3- | jq -r '. | "\(.eventTime) \(.sourceIPAddress) \(.userAgent)"'
2024-06-22T14:13:27Z 13.38.45.7 aws-sdk-js/3.496.0 ua/2.0 os/linux#6.1.92-99.174.amzn2023.x86_64 lang/js md/nodejs#20.13.1 api/sts#3.496.0 configure-aws-credentials-for-github-actions
2024-06-22T14:13:27Z 13.38.45.7 aws-sdk-js/3.496.0 ua/2.0 os/linux#6.1.92-99.174.amzn2023.x86_64 lang/js md/nodejs#20.13.1 api/sts#3.496.0 configure-aws-credentials-for-github-actions
2024-06-22T14:13:27Z 13.38.45.7 aws-sdk-js/3.496.0 ua/2.0 os/linux#6.1.92-99.174.amzn2023.x86_64 lang/js md/nodejs#20.13.1 api/sts#3.496.0 configure-aws-credentials-for-github-actions
2024-06-22T14:48:59Z 13.38.45.7 aws-cli/2.17.0 md/awscrt#0.20.11 ua/2.0 os/linux#6.1.92-99.174.amzn2023.x86_64 md/arch#x86_64 lang/python#3.11.8 md/pyimpl#CPython cfg/retry-mode#standard md/installer#exe md/distrib#debian.12 md/prompt#off md/command#eks.update-kubeconfig

All actions are coming from an AWS IP, so likely from within the cluster in our case. The first three User-Agents indicate we’re in a GitHub workflow, which is probably our legitimate runner. The last one is an aws-cli command which aimed to retrieve a kubeconfig (notice the end of the UA), to connect to our cluster at 14:48 UTC. This matches the times of the kubectl exec commands from the attacker, that just followed. Nothing else was done by the attacker with those credentials, even though he had access to all ECR repositories…

That’s basically all we can see. Let’s note that we found all Kubernetes pivots and this would have been really complicated without the appropriate logs.

What we didn’t find from the kill chain

Let’s summarize what we couldn’t find in the previous section.

How the attacker got its initial access in flasky-app pods
All commands within the pods (how he downloaded its tooling, etc.)
How the attacker exfiltrated the repository’s secrets

The first point is problematic since by default, Amazon EKS creates an ELB (Layer 4) for Service resources of LoadBalancer type. We should have created an Ingress, which should have created an ALB (Layer 7). However, it’s a common setup in the wild, and sometimes that’s how initial accesses remain mysteries.

An EDR installed on the nodes would have helped detecting it though, depending on the commands issued by the attacker. In our case, our commands were very noisy and we should have been detected.

The last point would have needed to export GitHub security logs, but that’s not the scope of this article. With an EDR, we could have reviewed what commands were executed by all runners, to finally find our odd malicious curl. With VPC flows enabled, we could also have checked the outbound connections made by the runners and see any anomaly between multiple runs.

I hope this showed you how important it is to benefit from all the supervision we can have in case of an incident.

I often hear DevOps complaining about the pain of installing EDRs in their workloads, or forwarding logs centrally
I also often hear SOC not caring much about forwarding key security logs, or resorting only to good ol’ forensics

Again, if I would have used a SOCKS proxy in the backdoor, the EDR would have been completely blind and the only reliable source of logs would have been Kubernetes API logs, along with CloudTrail. I’ll probably do this in a following article.

That’s it! For SOC people looking for detection use cases in Kubernetes, I’d be happy to share some ideas in PMs, even though from this last part specifically you might already imagine some interesting ones. :)

> Part 2: Attacking Kuby infrastructure (previous)