Skip to main content

EKS - Elastic Kubernetes Service

https://aws.amazon.com/eks

Console: https://console.aws.amazon.com/eks/home

Best practices - https://docs.aws.amazon.com/eks/latest/best-practices/introduction.html

Blog - https://aws.amazon.com/blogs/containers/category/compute/amazon-kubernetes-service/

Amazon EKS Blueprints for Terraform - https://github.com/aws-ia/terraform-aws-eks-blueprints - https://www.youtube.com/watch?v=DhoZMbqwwsw

Amazon EKS Helm chart repository - https://github.com/aws/eks-charts

AWS Controllers for Kubernetes (ACK) - Manage AWS services from Kubernetes - https://github.com/aws-controllers-k8s - https://aws-controllers-k8s.github.io/community/

https://www.eksworkshop.com

Containers roadmap - https://github.com/orgs/aws/projects/244 - https://github.com/aws/containers-roadmap

https://github.com/kubernetes-sigs/aws-iam-authenticator - Use AWS IAM credentials to authenticate to a Kubernetes cluster

What's New with Containers? https://aws.amazon.com/about-aws/whats-new/containers/?whats-new-content.sort-by=item.additionalFields.postDateTime&whats-new-content.sort-order=desc&awsf.whats-new-products=*all

https://github.com/awslabs/eksdemo - Application catalog (Argo, Cilium, Crossplane, Flux, Itio...)

EKS Node Viewer - https://github.com/awslabs/eks-node-viewer/

Find Amazon EKS optimized AMI IDs - https://github.com/guessi/eks-ami-finder

Features

  • On AWS and on-premises.
  • Certified Kubernetes-conformant.
  • Amazon manages, scales, backups, upgrades and patches the control plane. Control plane components (API server, etcd) are deployed to multiple AZs for high availability and fault tolerance, and EKS actively monitors and adjusts control plane instances to maintain peak performance. Control plane components run in AWS-owned accounts.
  • Integration with ELB, IAM for access and RBAC, VPC for isolation, CloudTrail for logging, ECR for container images, KMS for encrypting secrets...
  • Cluster Autoscaler, Karpenter.
  • Volumes with EBS, EFS, FSx, S3...
  • Monitoring with CloudWatch container insights, Prometheus, AWS Distro for OpenTelemetry (ADOT)...
  • On-prem and edge locations.

Amazon EKS Explained - https://www.youtube.com/watch?v=E956xeOt050

Glossary

  • Data plane: the worker nodes where containers run.
  • Node group: think of it like an EC2 Auto Scaling group.

Concepts

EKS manages Kubernetes clusters. Kubernetes manages Kubernetes objects.

The control plane (API, etcd servers, scheduler...) runs in a VPC managed by AWS, in an AWS-owned account. The data plane (worker nodes) runs in your VPC, in your customer account.

For the control plane:

  • AWS will provision and manage at least two API servers spread in two distinct Availability Zones. API servers are exposed through a public Network Load Balancer.
  • AWS will provision and manage etcd servers spread across three Availability Zones, using an autoscaling group.
  • See "Mastering Elastic Kubernetes Service on AWS" pages 13 and 19. and Understand resilience in Amazon EKS clusters.

Communication:

  • Control plane → worker nodes. The control plane is connected to your VPC through cross-account Elastic Network Interfaces (ENIs) that allow traffic from the control plane to the worker nodes. ENIs are deployed to your VPC, to the data plane subnets you specify. See VPC and subnets.
  • Worker nodes → control plane. Traffic from the worker nodes to the control plane API server can stay within the customer VPC using a VPC endpoint (PrivateLink), or leave the customer VPC through a Network Load Balancer, see API server endpoint access.

Learn

https://d1.awsstatic.com/training-and-certification/ramp-up_guides/Ramp-Up_Guide_Containers.pdf

https://aws.amazon.com/architecture/containers

Skill Builder: https://skillbuilder.aws/search?searchText=eks&page=1

EKS Workshop

https://www.eksworkshop.com - https://github.com/aws-samples/eks-workshop-v2 - Source code of the app: https://github.com/aws-containers/retail-store-sample-app

You should start each lab from the page indicated by this badge. Starting in the middle of a lab will cause unpredictable behavior.

If the cluster is not functioning, run the command prepare-environment to reset it.

IAM roles

AWS managed policies for Amazon Elastic Kubernetes Service - https://docs.aws.amazon.com/eks/latest/userguide/security-iam-awsmanpol.html

IAM RoleUsed On (EKS Mode)Assumed ByPrincipalPurposePermissions Policy
Cluster roleAll clustersEKS control planeService: eks.amazonaws.comAllow EKS to manage cluster resources (EC2, Auto Scaling, ELB, ENIs)AmazonEKSClusterPolicy, AmazonEKSVPCResourceController
Node instance roleEC2-based nodesWorker nodes (EC2 instances)Service: ec2.amazonaws.comAllow EC2 instances to access AWS (pull ECR images, CNI, etc.)AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryPullOnly
Fargate pod executionFargate profilesFargate infrastructureService: eks-fargate-pods.amazonaws.comPull images, CloudWatch logs, network setupAmazonEKSFargatePodExecutionRolePolicy, CloudWatchLogsFullAccess (optional)
IRSAEC2 or Fargate podsPods (via service accounts)Federated: arn:aws:iam::<account-id>:oidc-provider/<oidc-provider>Allow pods (apps) fine-grained access to AWS services (S3, DynamoDB)Custom (eg S3, DynamoDB)
EKS Pod IdentityEC2 or Fargate podsPods (via EKS agent)Service: pods.eks.amazonaws.comApp-level AWS API access without OIDCCustom (eg S3, DynamoDB)

Note: to run Pods on Fargate you need a Pod execution IAM role.

Cluster role

https://docs.aws.amazon.com/eks/latest/userguide/cluster-iam-role.html

Allows the cluster Kubernetes control plane to manage AWS resources on your behalf. Clusters use this role to manage nodes.

The role has the AWS managed permission policy AmazonEKSClusterPolicy, which allows the control plane to interact with the following AWS services: EC2, Elastic Load Balancing, Auto Scaling and KMS (see explanation). We can optionally attach AmazonEKSVPCResourceController to manage ENIs and IP addresses for worker nodes (see explanation). If using Auto Mode, you must also attach AmazonEKSBlockStoragePolicy, AmazonEKSComputePolicy, AmazonEKSLoadBalancingPolicy, AmazonEKSNetworkingPolicy (see below).

To create this role using the console, go to IAM → Roles, click "Create role" and set:

  • Trusted entity type: AWS service
  • Service or use case: EKS - Cluster. Allows the cluster Kubernetes control plane to manage AWS resources on your behalf.

The wizard attaches the AWS managed permission policy AmazonEKSClusterPolicy. After the role is created, go to the role page and at the Permissions tab, do "Add permissions" → "Attach policies" and attach AmazonEKSVPCResourceController.

Trust policy (trusted entities):

eks-cluster-role-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

To create this role using the CLI run (source):

aws iam create-role \
--role-name MyAmazonEKSClusterRole \
--assume-role-policy-document file://"eks-cluster-role-trust-policy.json"
aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \
--role-name MyAmazonEKSClusterRole

Node role

https://docs.aws.amazon.com/eks/latest/userguide/create-node-role.html

The node role is assumed by EC2 instances, the worker nodes. Is like an EC2 instance profile. You can see the role at the EC2 console → Instances → select an instance → IAM Role.

Gives permissions to the kubelet running on the node to make calls to the Kubernetes API and other AWS APIs on your behalf. This includes permissions to access container registries like ECR where your application container images are stored.

To create this role using the console, go to IAM → Roles, click "Create role" and set:

  • Trusted entity type: AWS service
  • Service or use case: EC2. Allows EC2 instances to call AWS services on your behalf.

At the "Add permissions" page, filter by "EKS" and attach these policies:

Then filter by "ec2containerregistry" and attach the policy AmazonEC2ContainerRegistryPullOnly, which allows the nodes to pull images from ECR. You can also use AmazonEC2ContainerRegistryReadOnly, which allows to list repositories, describe images, ect. By adding this permission policy, we can use private ECR repositories without having to specify imagePullSecrets in the Kubernetes pod spec.

Optional: If you want to access the nodes using Session Manager, attach AmazonSSMManagedInstanceCore. The SSM Agent is installed automatically on Amazon EKS optimized AMIs (source).

Trust policy (trusted entities):

node-role-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

To create this role using the CLI run (source):

aws iam create-role \
--role-name MyAmazonEKSNodeRole \
--assume-role-policy-document file://"node-role-trust-policy.json"
aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \
--role-name MyAmazonEKSNodeRole
aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly \
--role-name MyAmazonEKSNodeRole
aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
--role-name MyAmazonEKSNodeRole

IRSA (IAM roles for service accounts)

tip

Is recommended to use pod identity instead.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/

https://aws.amazon.com/blogs/containers/diving-into-iam-roles-for-service-accounts/

https://www.eksworkshop.com/docs/security/iam-roles-for-service-accounts/

Enables Kubernetes service accounts to assume IAM roles. Allows individual pods to assume IAM roles and securely access AWS services (like S3 or DynamoDB) without giving permissions to the node role, which would grant permissions to all nodes. Eliminates the need to store static credentials (access keys) inside containers.

Uses an OIDC provider, which has a URL like https://oidc.eks.<region>.amazonaws.com/id/<id>. You can find the URL at the EKS console → your cluster → Overview tab → Details section → OpenID Connect provider URL, or by running aws eks describe-cluster --name MyCluster --region us-east-1 --query cluster.identity.oidc.issuer --output text.

Setup using the Console

First, you need to create an OIDC Identity Provider. This is done only once per cluster.

See instructions at Create an IAM OIDC provider for your cluster.

At the IAM console → Identity providers, click "Add provider" and set:

  • Provider type: OpenID Connect.
  • Provider URL: the OIDC provider URL of your cluster (https://oidc.eks.<region>.amazonaws.com/id/<id>).
  • Audience: sts.amazonaws.com.

Next, create an IAM role to be used by a Kubernetes service account. Go to the IAM console → Roles and click "Create role". Set:

  • Trusted entity type: Custom trust policy.
  • Paste this trust policy (trusted entities), replacing <account-id>, <oidc-provider>, <namespace> and <service-account-name>:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<account-id>:oidc-provider/<oidc-provider>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<oidc-provider>:aud": "sts.amazonaws.com",
"<oidc-provider>:sub": "system:serviceaccount:<namespace>:<service-account-name>"
}
}
}
]
}

Note that the Principal is Federated, not Service. The <oidc-provider> is oidc.eks.<region>.amazonaws.com/id/<id>. You can get it at the console, at the cluster Overview tab → Details section → OpenID Connect provider URL (remove https://), or by running aws eks describe-cluster --name MyCluster --region us-east-1 --query cluster.identity.oidc.issuer --output text | sed -e "s/^https:\/\///".

Select any permissions policy you need, eg to access S3.

Once the role is created, annotate the service account to link it to the IAM role:

kubectl annotate serviceaccount <service-account-name> -n <namespace> eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/MyEKSServiceAccountRole

To see the annotation, run kubectl get serviceaccount <service-account-name> -n kube-system -o yaml or kubectl describe serviceaccount <service-account-name> -n kube-system.

You may need to create a Kubernetes service account:

kubectl create serviceaccount <service-account-name> -n <namespace>

Setup using Terraform

https://github.com/aws-samples/eks-workshop-v2/tree/stable/manifests/modules/security/irsa/.workshop/terraform

Pod Identity

https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html

https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-eks-pod-identity/

Makes it easy to use an IAM role across multiple clusters without the need to update the role trust policy and simplifies policy management by enabling the reuse of permission policies across IAM roles

https://aws.amazon.com/blogs/containers/amazon-eks-pod-identity-a-new-way-for-applications-on-eks-to-obtain-iam-credentials/

https://www.eksworkshop.com/docs/security/amazon-eks-pod-identity/

Does the same than service accounts, but with less config and doesn't require OIDC.

To grant workloads access to AWS resources using AWS APIs, you use Pod Identity to associate an AWS IAM Role to a Kubernetes Service Account.

Roles can be used in multiple clusters. Is backwards compatible with IRSA.

EKS Pod Identity vs IRSA - https://www.youtube.com/watch?v=aUjJSorBE70

You need to install the EKS Pod Identity Agent, an EKS Add-on, which is an agent pod that runs on each node. It's pre-installed on EKS Auto Mode clusters.

Source code: https://github.com/aws/eks-pod-identity-agent

Trust policy (trusted entities):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": ["sts:AssumeRole", "sts:TagSession"]
}
]
}

Associate an IAM role with a Kubernetes service account:

aws eks create-pod-identity-association \
--clusterName my-cluster \
--namespace my-namespace \
--serviceAccount my-service-account \
--roleARN my-iam-role-arn

Setup with Terraform

https://github.com/aws-samples/eks-workshop-v2/tree/stable/manifests/modules/security/eks-pod-identity/.workshop/terraform

Security groups

https://www.eksworkshop.com/docs/networking/vpc-cni/security-groups-for-pods/

Cluster security group

https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html

https://aws.amazon.com/blogs/containers/enhanced-vpc-flexibility-modify-subnets-and-security-groups-in-amazon-eks/

Allows communication between the control plane (API server) and worker nodes:

  • Control plane ↔ kubelet communication (API, health checks)
  • Control plane ↔ cluster add-ons (CNI plugin, CoreDNS, etc.)

Is created by EKS automatically when you create a cluster (unless you specify one).

Name is like: eks-cluster-sg-<cluster-name>-<random-id>. For example, eks-cluster-sg-MyCluster-303637302.

EKS automatically associates this security group to the following resources that it also creates:

  • 2–4 elastic network interfaces (ENIs) that are created when you create a cluster.
  • Network interfaces of the nodes in any managed node group that you create.

The SG description is: "EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.".

Allows all outbound traffic to any destination (0.0.0.0/0). You can restrict it but you must allow outbound traffic TCP 443 to reach the worker nodes, TCP 10250 for the kubelet API and TCP UDP 53 for DNS.

Inbound rules:

SourcePortPurpose
Node security group443kubelet API traffic
SelfAllNode-to-node cluster communication

The cluster SG it's used as the source in an inbound rule on the node's SG. EKS automatically updates the node's security group inbound rules to allow inbound traffic on TCP 443 from the cluster security group. That rule is what lets the control plane (which uses the cluster SG) reach each node's kubelet API.

   +---------------------------+
| EKS Control Plane |
| (AWS-managed ENIs w/ |
| Cluster Security Group) |
+-------------+-------------+
|
TCP 443 |
(kubelet API) |
v
+-----------+-----------+
| Worker Node |
| (EC2 + Node SG) |
+-----------------------+
Inbound rule: Allow 443
from Cluster SG

The outbound rules can be modified, but the inbound not. From https://aws.amazon.com/blogs/containers/enhanced-vpc-flexibility-modify-subnets-and-security-groups-in-amazon-eks/

The default inbound rules include all access from within the security group and shared node security group, which enables bi-directional communication between the control plane and the nodes. Today, these rules can’t be deleted or modified. If you remove the default inbound rule, then Amazon EKS recreates it whenever the cluster is updated.

The default outbound rule of the cluster security group allows all traffic. Optionally, users can remove this egress rule and limit the open ports between the cluster and nodes. You can remove the default outbound rule and add the minimum rules required for the cluster.

Revoke EKS Cluster Security Group Egress Rule https://github.com/aws-samples/revoke-eks-cluster-security-group-egress-rule

Node security group

Controls inbound/outbound traffic for worker nodes (EC2 instances). Attached to all EC2 instances in your EKS managed node group or self-managed node group.

By default, EKS uses the cluster security group as the node security group.

Used for:

  • Allow inbound SSH (for admin access).
  • Allow node-to-node and pod-to-pod traffic.
  • Allow outbound internet traffic (to pull images, call AWS APIs, software updates, etc.).

The node SG must allow outbound traffic on 443 to reach the control plane API server.

VPC and subnets

https://docs.aws.amazon.com/eks/latest/userguide/network-reqs.html - Networking requirements for VPC and subnets

https://docs.aws.amazon.com/eks/latest/userguide/creating-a-vpc.html

https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html

https://docs.aws.amazon.com/eks/latest/best-practices/subnets.html

https://www.eksworkshop.com/docs/networking/vpc-cni/custom-networking/

https://aws.amazon.com/blogs/containers/optimize-ip-addresses-usage-by-pods-in-your-amazon-eks-cluster/

https://www.reddit.com/r/aws/comments/12bdyj5/eks_changingadding_subnets/

https://aws.amazon.com/blogs/containers/enhanced-vpc-flexibility-modify-subnets-and-security-groups-in-amazon-eks/

Don't use the default VPC, use a custom VPC with private subnets. Deploy nodes in private subnets and use public subnets for load balancers and ingress.

Control plane → worker node communication. The control plane is connected to your VPC through cross-account Elastic Network Interfaces (ENIs) that allow traffic from the control plane to the worker nodes. ENIs are deployed to your VPC, to the subnets of the data plane you specify when you create a new cluster (Choose the subnets in your VPC where the control plane may place elastic network interfaces (ENIs) to facilitate communication with your cluster.). These subnets are known as the cluster subnets.

These network interfaces also enable Kubernetes features that use the kubelet API such as kubectl attach, kubectl cp, kubectl exec, kubectl logs and kubectl port-forward commands.

There are two to four ENIs, so you must specify at least two subnets, which must be in at least two different Availability Zones. ENIs can be deployed to public and private subnets, but it is recommended to use private subnets.

You can see the EKS created ENIs at the EC2 console. They have the description Amazon EKS ${clusterName}. You can see the subnets at the field cluster.resourcesVpcConfig.subnetIds of aws eks describe-cluster, or at the EKS console (Networking tab). You can change them at the Networking tab → Manage → VPC Resources.

Note that the ENIs that allow control plane to worker nodes communication are always deployed. In contrast, the VPC endpoint (AWS PrivateLink) that allows private communication from worker nodes to the control plane API server is only deployed if you configure endpoint private access, see API server endpoint access.

You can't use subnets in AZ use1-az3 (us-east-1), usw1-az2 (us-west-1) and cac1-az3 (ca-central-1) (source). You get this error: UnsupportedAvailabilityZoneException: Cannot create cluster 'kubernetes-example' because EKS does not support creating control plane instances in us-east-1e, the targeted availability zone. Retry cluster creation using control plane subnets that span at least two of these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f. Note, post cluster creation, you can run worker nodes in separate subnets/availability zones from control plane subnets/availability zones passed during cluster creation. Use aws ec2 describe-availability-zones to see the mapping of AZ identifiers (us-east-1e) to different AZ data centers (use1-az3).

Control subnets

Explained at https://docs.aws.amazon.com/eks/latest/userguide/network-reqs.html#network-requirements-subnets

From https://aws.amazon.com/blogs/containers/enhanced-vpc-flexibility-modify-subnets-and-security-groups-in-amazon-eks/

To control which subnets network interfaces are created in, you can limit the number of subnets you specify to only two when you create a cluster.

From Apress AWS EKS Essentials page 22:

During K8s version updates, EKS deletes and recreates new ENIs. Unfortunately, there is no guarantee that EKS will create the ENIs in the same subnets as your preferred subnets except you limit the number of subnets for the ENIs to only two during or after cluster creation.

IP address exhaustion

It is recommended to place the ENIs in dedicated cluster subnets with /28 netmask, different from the worker node subnets, to reduce the odds of IP address exhaustion within the cluster network.

Note that /28 is 16 IP addresses, but the first four and the last IP addresses in each CIDR block are reserved (source).

From https://simyung.github.io/aws-eks-best-practices/networking/subnets/#vpc-configurations

Kubernetes worker nodes can run in the cluster subnets, but it is not recommended. During cluster upgrades Amazon EKS provisions additional ENIs in the cluster subnets. When your cluster scales out, worker nodes and pods may consume the available IPs in the cluster subnet. Hence in order to make sure there are enough available IPs you might want to consider using dedicated cluster subnets with /28 netmask.

From Apress AWS EKS Essentials pages 22 and 70:

In production environments, ENIs are often deployed in their own subnets such that they do not coexist with the worker nodes. This is necessary to decouple the ENIs subnets from the worker nodes and to reduce the odds of IPv4 address exhaustion within the cluster network.

The ENIs subnets are often allocated the /28 CIDR block within the data plane VPC. We should have deployed the ENIs in the 10.0.1.0/28 and 10.0.2.0/28 subnets if we were to follow the best practices.

It is recommended to decouple the ENIs from the subnets of the worker nodes by deploying them in their separate /28 subnets. For example, in Figure 2-1 both worker nodes and ENIs share the same subnets, creating a highly coupled architecture and contributing to IPv4 address exhaustion.

From https://aws.amazon.com/blogs/containers/enhanced-vpc-flexibility-modify-subnets-and-security-groups-in-amazon-eks/

One common misconception is that cluster subnets chosen when creating an Amazon EKS cluster serve as the primary targets for nodes and users can only use these subnets for creating the nodes (i.e., Kubernetes nodes). Instead of being the designated subnets for nodes, cluster subnets have a distinct role of hosting cross-account ENIs as specified above.

If you don’t specify separate subnets for nodes, then they may be deployed in the same subnets as your cluster subnets. Nodes and Kubernetes resources can run in the cluster subnets, but it isn’t recommended. During cluster upgrades, Amazon EKS provisions additional ENIs in the cluster subnets. When your cluster scales out, nodes and pods may consume the available IPs in the cluster subnet. Hence, in order to make sure there are enough available Ips, you might want to consider using dedicated cluster subnets with /28 netmask.

With the AWS Load Balancer Controller, you can choose the specific subnets where load balancers can be deployed, or you can use the auto-discovery feature by tagging the subnets. Cluster subnets can still be used for load balancers, but this is not a best practice, as it can lead to IP exhaustion, similar to the previous case.

Amazon EKS doesn’t automatically create new ENIs in subnets that weren’t designated as cluster subnets during the initial cluster setup. If you have worker nodes in subnets other than your original cluster subnets (i.e., where the cross-account ENIs are located), then they can still communicate with the Amazon EKS control plane if there are local routes in place within the VPC that allow this traffic. Essentially, the worker nodes need to be able to resolve and reach the Amazon EKS API server endpoint. This setup might involve transit through the subnets with the ENIs, but it’s the VPC’s internal routing that makes this possible.

VPC examples

Use shared VPC subnets in Amazon EKS - https://aws.amazon.com/blogs/containers/use-shared-vpcs-in-amazon-eks/ - https://github.com/aws-samples/eks-shared-subnets/ - See AI docs at https://deepwiki.com/aws-samples/eks-shared-subnets - There are two accounts, workload and networking. There are public, private and control plane subnets.

Examples of VPC for EKS (source):

API server endpoint access

https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html

Enable Private Access to the Amazon EKS Kubernetes API with AWS PrivateLink - https://aws.amazon.com/blogs/containers/enable-private-access-to-the-amazon-eks-kubernetes-api-with-aws-privatelink

Communication from the control plane to worker nodes always uses the ENIs in the data plane, and the ENIs are always deployed. In contrast, communication from worker nodes or external clients to the control plane API server can be configured. If you configure private access, EKS deploys an interface VPC endpoint (AWS PrivateLink) in your VPC to allow private communication from worker nodes to the control plane API server. If you configure public access, EKS deploys a public Network Load Balancer (NLB) to allow communication from external clients and worker nodes to the control plane API server.

Cluster API server endpoint access:

  • Public only (default): the API server is reachable over the Internet, from outside the VPC. Worker node traffic leaves the VPC (but not Amazon’s network) to communicate to the endpoint.
  • Private only: restrict API server to internal VPC traffic only. Access to a private API server is restricted to your VPC, so cluster administrators need to use a VPN like Direct Connect, a bastion host or PrivateLink. Worker node traffic to the endpoint will stay within your VPC, using the private VPC endpoint.
  • Public and private: the API server is publicly accessible from outside your VPC, for example for admin tasks. Worker node traffic to the endpoint will stay within your VPC, using the private VPC endpoint.

See cluster.resourcesVpcConfig.endpointPublicAccess and cluster.resourcesVpcConfig.endpointPrivateAccess in the output of aws eks describe-cluster --name $EKS_CLUSTER_NAME.

You can define a CIDR block for the public API server endpoint access, to restrict which IPs can access the endpoint. See cluster.resourcesVpcConfig.publicAccessCidrs in the output of aws eks describe-cluster --name $EKS_CLUSTER_NAME. From the console:

You can, optionally, limit the CIDR blocks that can access the public endpoint. If you limit access to specific CIDR blocks, then it is recommended that you also enable the private endpoint, or ensure that the CIDR blocks that you specify include the addresses that worker nodes and Fargate pods (if you use them) access the public endpoint from.

Cluster Access Management API

https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html

https://www.eksworkshop.com/docs/security/cluster-access-management/

The Cluster Access Management API is used to to provide authentication and authorization for AWS IAM principals to Amazon EKS Clusters. It simplifies identity mapping between AWS IAM and Kubernetes RBAC, eliminating the need to switch between AWS and Kubernetes APIs for access management.

Before the Cluster Access Management API was available, Amazon EKS relied on the aws-auth ConfigMap.

Initially, the AWS user account used to create the cluster is the only user account that will have access.

Cluster authentication modes (accessConfig):

  • aws-auth ConfigMap only (CONFIG_MAP). The original, old method. Will be deprecated in the future.
  • Both EKS API and aws-auth ConfigMap (API_AND_CONFIG_MAP).
  • EKS API - Access entries only (API). Recommended. What you get if you create a cluster using the management console. Required by Auto Mode.

See which cluster authentication mode you are using: aws eks describe-cluster --name $EKS_CLUSTER_NAME --query cluster.accessConfig. With EKS API it returns { "authenticationMode": "API" }.

You can update your cluster configuration from CONFIG_MAP to API_AND_CONFIG_MAP, and from API_AND_CONFIG_MAP to API, but not the other way around.

  • Access entries: IAM principals (users or roles) that are granted access to the cluster using the Cluster Access Management API. They are bound to a cluster.
    • List access entries in your cluster: aws eks list-access-entries --cluster $EKS_CLUSTER_NAME.
    • You can view them at the console, at the cluster → "Access" tab → "IAM access entries".
  • Access policies: predefined sets of EKS specific permission policies that can be assigned to access entries. They exist in your AWS account even if you don't have any cluster.

To see an access entry use describe-access-entry, where <principal-arn> is the ARN of the IAM user or role from aws eks list-access-entries --cluster $EKS_CLUSTER_NAME:

aws eks describe-access-entry --cluster $EKS_CLUSTER_NAME --principal-arn <principal-arn>

Cluster access with kubectl

Connect kubectl to an EKS cluster by creating a kubeconfig file - https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html

Use update-kubeconfig docs to configure kubectl to talk to an EKS cluster using AWS Credentials:

aws eks update-kubeconfig --name <cluster>
# Added new context arn:aws:eks:us-east-1:111222333444:cluster/My-EKS-Cluster to /Users/albert/.kube/config

This updates the kubeconfig file (~/.kube/config), adding a new entry. (Use kubectl config get-contexts and kubectl config current-context to view the new context, and kubectl cluster-info to view the cluster info.)

At the ~/.kube/config file, the value of cluster.server will match the value of the "API server endpoint" at the management console, and the value of cluster.certificate-authority-data will match the "Certificate authority". (Use kubectl config view to view the kubeconfig.)

Authorization error

Initially, only the IAM principal that created the cluster has cluster administrator access. You can run kubectl version to check if your AWS CLI profile has access to the cluster. If the output says Server Version: v1.34.1-eks-d96d92f, then you have access. But if it says "error: You must be logged in to the server (the server has asked for the client to provide credentials)", you don't.

To fix access see:

You can also give a principal administrator access using the management console. To configure a new IAM access entry, go to the cluster → "Access" tab → "IAM access entries" and click "Create". Select the "IAM principal ARN". Set "Type" to "Standard". Select the access policy AmazonEKSClusterAdminPolicy.

Cluster upgrades

https://docs.aws.amazon.com/eks/latest/best-practices/cluster-upgrades.html

CLI

https://docs.aws.amazon.com/cli/latest/reference/eks/

List available commands: aws eks help

Update kubeconfig file to access the cluster with kubectl

List clusters:

aws eks list-clusters

Provides similar information than kubectl cluster-info and kubectl version.

Describe a cluster:

aws eks describe-cluster --name MyCluster
aws eks describe-cluster --name $EKS_CLUSTER_NAME --query cluster.accessConfig

Get the OIDC provider URL (--region is optional if you have set a default region in your AWS CLI config file):

aws eks describe-cluster --name MyCluster --region us-east-1 --query cluster.identity.oidc.issuer --output text

Save to a variable:

VPC_ID=$(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query cluster.resourcesVpcConfig.vpcId --output text)

Wait for a cluster to have status ACTIVE:

aws eks wait cluster-active --name $EKS_CLUSTER_NAME
# When done doing
aws eks describe-cluster --name $EKS_CLUSTER_NAME --query cluster.status
# Will print "ACTIVE"

Update cluster configuration:

aws eks update-cluster-config \
--name $EKS_CLUSTER_NAME \
--resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=false

List available access policies (AmazonEKSAdminPolicy, AmazonEKSClusterAdminPolicy, etc.) in your account:

aws eks list-access-policies

All the policies ARNs are arn:aws:eks::aws:cluster-access-policy/XYZ.

List access entries in your cluster:

aws eks list-access-entries --cluster $EKS_CLUSTER_NAME

List associated access policies for an IAM principal (user or role):

aws eks list-associated-access-policies \
--cluster-name $EKS_CLUSTER_NAME \
--principal-arn <iam_principal_arn>

eksctl

https://eksctl.io - https://github.com/eksctl-io/eksctl

You can install it using a script. To install with Homebrew there are 3 options:

ClusterConfig file examples: https://github.com/guessi/eks-tutorials/tree/main/cluster-config

# Adapted from https://github.com/guessi/eks-tutorials/blob/main/cluster-config/cluster-full.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: My-EKS-Cluster
region: us-east-1
version: '1.34'

availabilityZones:
- us-east-1a
- us-east-1b

privateCluster:
enabled: false

kubernetesNetworkConfig:
ipFamily: IPv4

vpc:
cidr: 192.168.0.0/16
clusterEndpoints:
privateAccess: true
publicAccess: true
manageSharedNodeSecurityGroupRules: true
nat:
gateway: Single # Options: HighlyAvailable, Disable, Single (default)
publicAccessCIDRs: # you should configured a proper CIDR list here
- 0.0.0.0/0

accessConfig:
authenticationMode: API_AND_CONFIG_MAP
bootstrapClusterCreatorAdminPermissions: true

iam:
withOIDC: true

managedNodeGroups:
- name: mng-1
amiFamily: AmazonLinux2023
minSize: 2
maxSize: 3
desiredCapacity: 2
volumeSize: 20
volumeType: gp3
instanceTypes:
- 't3.small'
enableDetailedMonitoring: true
privateNetworking: true
disableIMDSv1: true
disablePodIMDS: false
spot: true
ssh:
allow: false
# availabilityZones:
# - us-east-1a
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
# (Optional) Only required if you need "EC2 Instance Connect"
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
# (Optional) Only required if you are using "SSM"
- arn:aws:iam::aws:policy/AmazonSSMPatchAssociation
# (Optional) Only required if you have "Amazon CloudWatch Observability" setup
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
- arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess

addonsConfig:
autoApplyPodIdentityAssociations: true

addons:
- name: kube-proxy
version: latest
- name: vpc-cni
version: latest
useDefaultPodIdentityAssociations: true
- name: coredns
version: latest
- name: eks-pod-identity-agent
version: latest
- name: metrics-server
version: latest

cloudWatch:
# ref: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
clusterLogging:
logRetentionInDays: 90
enableTypes:
- 'api'
- 'audit'
- 'authenticator'
- 'controllerManager'
- 'scheduler'

Get command options (get help):

eksctl create cluster --help

Use --dry-run to validate a cluster configuration file:

eksctl create cluster -f cluster.yaml --dry-run

You can also use --dry-run to generate a YAML file (note that there are options that cannot be represented in the ClusterConfig file, see the docs):

eksctl create cluster --name development --dry-run > cluster.yaml
eksctl create cluster -f cluster.yaml

Create cluster:

eksctl create cluster --name MyCluster --region us-east-1 # Managed nodes
eksctl create cluster --name MyCluster --region us-east-1 --fargate
eksctl create cluster -f cluster.yaml
eksctl delete cluster --name MyCluster --region us-east-1

Add ons

https://docs.aws.amazon.com/eks/latest/userguide/workloads-add-ons-available-eks.html

See the required platform version:

aws eks describe-addon-versions --addon-name aws-ebs-csi-driver

Compute options

https://docs.aws.amazon.com/eks/latest/userguide/eks-compute.html

https://docs.aws.amazon.com/eks/latest/userguide/eks-architecture.html#nodes

https://docs.aws.amazon.com/eks/latest/best-practices/reliability.html

  • Self-managed nodes
    • Managed EC2 instances for total control.
    • You are responsible of the OS, kubelet, CRI and AMI configuration.
  • Managed node groups
    • AWS is responsible of the OS, kubelet, CRI and AMI configuration.
  • Karpenter
    • Continuous cost optimization. The right nodes at the right time.
    • Configure for On-Demand and Spot purchasing options, diversify instance types and handle Spot interruptions.
    • AWS is responsible of worker node scaling and configuration.
  • Auto Mode
    • AWS manages both control plane and worker nodes.
    • AWS handles EC2 provisioning, scaling, patching, and security.
    • The most managed option.
  • Fargate serverless compute
    • No need to manage EC2 nodes, even managed node groups.
    • Only pay for what you use.
    • Fargate compute runs in AWS owned accounts (in contrast with EC2 worker nodes, which run in customer accounts).
    • Has limitations (eg no DaemonSets).
info

EKS suggests using private subnets for worker nodes. (From the console Info sidebar.)

EC2 instance types

https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html

important

In general, fewer, larger instances are better, especially if you have a lot of Daemonsets. Each instance requires API calls to the API server, so the more instances you have, the more load on the API server.

https://stackoverflow.com/questions/62060942/what-ec2-instance-types-does-eks-support

Due to the way EKS works with ENIs, t3.small is the smallest instance type that can be used for worker nodes. If you try something smaller like t2.micro, which only has 4 ENIs, they'll all be used up by system services (e.g., kube-proxy) and you won't be able to deploy your own Pods. source

See number of ENIS available for each instance type here: https://github.com/aws/amazon-vpc-cni-k8s/blob/334cab5070396d914b80855add84ad7f7e2b8ed1/pkg/awsutils/vpc_ip_resource_limit.go#L19-L21

https://www.reddit.com/r/kubernetes/comments/baxrtj/eks_which_instance_types_and_why/

If you still want to use the t3 from a cost perspective, I would suggest you enable the T2/T3 Unlimited option for you instant. Where it will provide you with instant CPU cycles and you will never be throttled. However, AWS charges for these additional CPU cycles.

You need to keep a close watch on these additional CPU cycles consumption using CloudWatch. If it's continuously happening, upgrading to the M5 would be the right choice.

Note that there is a maximum number of pods for each EC2 instance type, see:

Managed node groups

https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html

https://www.eksworkshop.com/docs/fundamentals/compute/managed-node-groups/

A node group is a group of EC2 instances that supply compute capacity to your Amazon EKS cluster. You can add multiple node groups to your cluster. Node groups implement basic compute scaling through EC2 Auto Scaling groups.

Amazon EKS managed node groups make it easy to provision compute capacity for your cluster. managed node groups consist of one or more Amazon EC2 instances running the latest EKS-optimized AMIs. All nodes are provisioned as part of an Amazon EC2 Auto Scaling group that is managed for you by Amazon EKS and all resources including EC2 instances and autoscaling groups run within your AWS account.

Think of a node group like an EC2 Auto Scaling group. Indeed, when you create a node group using the console, it automatically creates an Auto Scaling group and a launch template (you can see both at the EC2 console).

You can define a custom launch template to customize the configuration of the EC2 instances.

The nodes in a node group use the node IAM role.

kubectl get nodes
kubectl get nodes --show-labels

Karpenter

https://karpenter.sh

https://github.com/aws/karpenter-provider-aws

https://github.com/kubernetes-sigs/karpenter

Optimize node usage.

Run Kubernetes Clusters for Less with Amazon EC2 Spot and Karpenter - https://community.aws/tutorials/run-kubernetes-clusters-for-less-with-amazon-ec2-spot-and-karpenter

https://github.com/aws-samples/karpenter-blueprints

Karpenter vs Cluster Autoscaler - https://www.youtube.com/watch?v=FIBc8GkjFU0

https://www.udemy.com/course/karpenter-masterclass-for-kubernetes

https://github.com/aws-samples/eks-workshop-v2/tree/stable/manifests/modules/autoscaling/compute/karpenter/.workshop/terraform

Auto Mode

https://docs.aws.amazon.com/eks/latest/userguide/automode.html

https://docs.aws.amazon.com/eks/latest/best-practices/automode.html

Workshop - https://catalog.workshops.aws/eks-auto-mode/en-US

Tutorial - https://aws.amazon.com/blogs/containers/getting-started-with-amazon-eks-auto-mode

https://www.youtube.com/watch?v=IQjsFlkqWQY

Auto Mode automates routine cluster tasks for compute, load balancing, storage and networking. We don't need to do any additional cluster configuration before launching our workloads. When using EKS Auto Mode, EC2 nodes are automatically provisioned and managed by EKS.

Auto Mode automatically scales cluster compute resources. If a pod can’t fit onto existing nodes, EKS Auto Mode creates a new one. EKS Auto Mode also consolidates workloads and deletes nodes. EKS Auto Mode builds upon Karpenter. source

EKS Auto Mode, which is the most managed option, handles provisioning, scaling and updates of the data plane along with providing managed Compute, Networking, and Storage capabilities. Auto Mode AMIs are released frequently and clusters are updated to the latest AMI automatically to deploy CVE fixes and security patches. You have the ability to control when this occurs by configuring disruption controls on your Auto Mode NodePools. source

Capabilities:

  • Application load balancing
  • Block Storage
  • Compute Autoscaling
  • GPU support
  • Cluster DNS
  • Pod and service networking

Standard mode vs Auto Mode

See Compare compute options and Shared responsibility model.

  • Standard mode:
    • AWS manages the control plane, you manage the worker nodes.
    • Custom AMI nodes.
    • Must update node Kubernetes version yourself.
  • Auto Mode:
    • AWS manages both control plane and worker nodes.
    • AWS handles EC2 provisioning, scaling, OS patching and security.
EKS standard vs Auto Mode
Source: AWS
EKS without Auto Mode
Source: AWS Experience
EKS with Auto Mode
Source: AWS Experience

Node Class

https://docs.aws.amazon.com/eks/latest/userguide/create-node-class.html

Defines infrastructure-level settings that apply to groups of nodes in your EKS cluster, including network configuration, storage settings, and resource tagging.

Node Pool

https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html

Defines EC2 instance categories, CPU configurations, availability zones, architectures (ARM64/AMD64), and capacity types (spot or on-demand). You can also set resource limits for CPU and memory usage.

There are two default managed node pools: general-purpose and system. The general-purpose node pool handles user-deployed applications and services, while the system node pool is dedicated to critical system-level components managing cluster operations. Custom node pools can be created for specific compute or configuration requirements.

View the node pools:

kubectl get nodepools
kubectl get nodepools general-purpose -o yaml

View nodes of each node pool:

kubectl get nodes -l karpenter.sh/nodepool=general-purpose
kubectl get nodes -l karpenter.sh/nodepool=system

View pods on each general-purpose EC2 node:

for node in $(kubectl get nodes -l karpenter.sh/nodepool=general-purpose -o custom-columns=NAME:.metadata.name --no-headers); do
echo "Pods on $node:"
kubectl get pods --all-namespaces --field-selector spec.nodeName=$node
done

View pods on each node, showing the availability zone:

kubectl get node -L topology.kubernetes.io/zone --no-headers | while read node status roles age version zone; do
echo "Pods on node $node (Zone: $zone):"
kubectl get pods --all-namespaces --field-selector spec.nodeName=$node -l app.kubernetes.io/instance=retail-store-app-ui
echo "-----------------------------------"
done

Enable Auto Mode

To use Auto Mode, the cluster role permissions policy needs to have the following managed policies or equivalent permissions (see docs):

Run this to attach the policies:

for POLICY in \
"arn:aws:iam::aws:policy/AmazonEKSComputePolicy" \
"arn:aws:iam::aws:policy/AmazonEKSBlockStoragePolicy" \
"arn:aws:iam::aws:policy/AmazonEKSLoadBalancingPolicy" \
"arn:aws:iam::aws:policy/AmazonEKSNetworkingPolicy" \
"arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
do
echo "Attaching policy ${POLICY} to IAM role ${CLUSTER_ROLE_NAME}..."
aws iam attach-role-policy --role-name ${CLUSTER_ROLE_NAME} --policy-arn ${POLICY}
done

Verify the policies are attached:

aws iam list-attached-role-policies --role-name $CLUSTER_ROLE_NAME

And the cluster role trust policy needs to have the action sts:TagSession. Add it with:

aws iam update-assume-role-policy --role-name $CLUSTER_ROLE_NAME --policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
}
]
}'

Verify the role has the action sts:TagSession in the trust policy:

aws iam get-role --role-name $CLUSTER_ROLE_NAME | \
jq -r '.Role.AssumeRolePolicyDocument.Statement[].Action[]'

Enable Auto Mode on an existing cluster:

aws eks update-cluster-config \
--name $CLUSTER_NAME \
--compute-config enabled=true,nodeRoleArn=$CLUSTER_NODE_ROLE_ARN,nodePools=system,general-purpose \
--kubernetes-network-config '{"elasticLoadBalancing":{"enabled": true}}' \
--storage-config '{"blockStorage":{"enabled": true}}'

Enabling Auto Mode adds some CRDs. Before:

$ kubectl get crds
NAME CREATED AT
applicationnetworkpolicies.networking.k8s.aws 2025-11-18T07:33:34Z
clusternetworkpolicies.networking.k8s.aws 2025-11-18T07:33:34Z
clusterpolicyendpoints.networking.k8s.aws 2025-11-18T07:33:34Z
cninodes.vpcresources.k8s.aws 2025-11-18T07:33:34Z
policyendpoints.networking.k8s.aws 2025-11-18T07:33:34Z
securitygrouppolicies.vpcresources.k8s.aws 2025-11-18T07:33:34Z

After:

$ kubectl get crds
NAME CREATED AT
applicationnetworkpolicies.networking.k8s.aws 2025-11-18T07:33:34Z
clusternetworkpolicies.networking.k8s.aws 2025-11-18T07:33:34Z
clusterpolicyendpoints.networking.k8s.aws 2025-11-18T07:33:34Z
cninodes.eks.amazonaws.com 2025-11-19T10:13:05Z
cninodes.vpcresources.k8s.aws 2025-11-18T07:33:34Z
ingressclassparams.eks.amazonaws.com 2025-11-19T10:13:00Z
nodeclaims.karpenter.sh 2025-11-19T10:12:46Z
nodeclasses.eks.amazonaws.com 2025-11-19T10:12:46Z
nodediagnostics.eks.amazonaws.com 2025-11-19T10:12:46Z
nodepools.karpenter.sh 2025-11-19T10:12:46Z
policyendpoints.networking.k8s.aws 2025-11-18T07:33:34Z
securitygrouppolicies.vpcresources.k8s.aws 2025-11-18T07:33:34Z
targetgroupbindings.eks.amazonaws.com 2025-11-19T10:13:00Z
  • nodepools support provisioning compute
  • ingressclassparams and targetgroupbinding allow exposing applications, and
  • nodediagnostics provide diagnostics capabilities

Fargate

https://docs.aws.amazon.com/eks/latest/userguide/fargate.html

https://www.eksworkshop.com/docs/fundamentals/compute/fargate/

No need to manage nodes. AWS manages everything below the pod. You simply define the CPU and memory, and AWS takes care of the rest.

Fargate compute runs in AWS owned accounts (in contrast with EC2 worker nodes, which run in customer accounts).

Better suited for stateless apps.

Pricing

https://aws.amazon.com/blogs/containers/saving-money-pod-at-time-with-eks-fargate-and-aws-compute-savings-plans/

Pricing is calculated per second with a 1-minute minimum. Duration is calculated from the time you start to download your container image (Docker pull) until the task terminates, rounded up to the nearest second. source

Limitations

See docs for more: https://docs.aws.amazon.com/eks/latest/userguide/fargate.html#fargate-consideration

  • Private subnets only → No public IPs, and outbound internet traffic must use a NAT Gateway.
  • No SSH.
  • No EC2 instance metadata service (IMDS) available to pods.
  • No DaemonSets.
  • No dynamic persistent volumes (PV).
  • Can’t mount EBS volumes to Fargate pods.
  • No GPU.

Pod execution role

https://docs.aws.amazon.com/eks/latest/userguide/pod-execution-role.html

Fargate uses a Pod execution IAM role for defining pod-level permissions, instead of a node instance role.

This role is used by the components running on the Fargate infrastructure to make calls to AWS APIs on your behalf. For example, to pull images from ECR, send logs to CloudWatch, etc.

To create this role using the console, go to IAM → Roles, click "Create role" and set:

  • Trusted entity type: AWS service
  • Service or use case: EKS - Cluster. Allows access to other AWS service resources that are required to run Amazon EKS pods on AWS Fargate.

The wizard attaches the AWS managed permission policy AmazonEKSFargatePodExecutionRolePolicy. To have CloudWatch logging, after the role is created, go to the role page and at the Permissions tab, do "Add permissions" → "Attach policies" and attach CloudWatchLogsFullAccess. You can alternatively attach these permissions.

Trust policy (trusted entities):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks-fargate-pods.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

Fargate profile

https://docs.aws.amazon.com/eks/latest/userguide/fargate-profile.html

Tells EKS which pods should be scheduled on Fargate when launched.

You declare which pods run on Fargate using selectors (namespace and labels). You create a profile that matches the namespace and labels defined as part of your pod. You can have up to 5 selectors. Each selector must contain a namespace. A pod must have all of the label keys and values that you provide in order to match with the profile.

When you create a Fargate profile, you must specify a Pod execution role. This execution role is for the EKS components that run on the Fargate infrastructure using the profile. When new pods are started, they will run on Fargate using the execution role and subnets defined in your profile.

Note that you can only use private subnets.

A profile cannot be edited after creation, you need to create a new one.

Run a pod

Say that we have a pod selector with namespace my-namespace and label fargate. To run a pod we can do:

kubectl run httpd --image httpd:latest -n my-namespace --labels type=fargate

Initially it appears as "NOMINATED NODE", until the new node is ready.

kubectl get pods -n my-namespace -o wide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# httpd 0/1 Pending 0 7s <none> <none> 99476e54e7-a33fecd824b94f25a22a8ceaeee39e69 <none>

kubectl get pods -n my-namespace -o wide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# httpd 1/1 Running 0 60s 172.31.100.212 fargate-ip-172-31-100-212.ec2.internal <none> <none>

Note that you need a NAT Gateway to pull the Docker image, otherwise you get the errors ErrImagePull and ImagePullBackOff.

Note that Fargate allocates a node for every pod. For example, if we have 2 pods running we have 2 nodes:

kubectl get nodes -n my-namespace
# NAME STATUS ROLES AGE VERSION
# fargate-ip-172-31-100-212.ec2.internal Ready <none> 42m v1.34.0-eks-2bf45b0
# fargate-ip-172-31-121-49.ec2.internal Ready <none> 2d18h v1.34.0-eks-2bf45b0

Deleting a pod (kubectl delete pod httpd -n my-namespace) also deletes the corresponding node.

CloudWatch logging

https://docs.aws.amazon.com/eks/latest/userguide/fargate-logging.html

To enable CloudWatch logging on Fargate pods, we need to create a ConfigMap named aws-logging at the namespace aws-observability.

When checking the Events section of kubectl describe pod httpd -n my-namespace, we see this event:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning LoggingDisabled 47s fargate-scheduler Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found

Note that we need to have CloudWatch permissions at the pod execution role. You can attach the AWS managed policy CloudWatchLogsFullAccess, or create a policy with these permissions (source):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:CreateLogGroup",
"logs:DescribeLogStreams",
"logs:PutLogEvents",
"logs:PutRetentionPolicy"
],
"Resource": "*"
}
]
}

Create the namespace:

kubectl apply -f aws-observability-namespace.yaml
aws-observability-namespace.yaml
kind: Namespace
apiVersion: v1
metadata:
name: aws-observability
labels:
aws-observability: enabled

Create the ConfigMap:

kubectl apply -f aws-logging-cloudwatch-configmap.yaml
aws-logging-cloudwatch-configmap.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: aws-logging
namespace: aws-observability
data:
flb_log_cw: 'true' # Set to true to ship Fluent Bit process logs to CloudWatch.
filters.conf: |
[FILTER]
Name parser
Match *
Key_name log
Parser crio
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
Buffer_Size 0
Kube_Meta_Cache_TTL 300s
output.conf: |
[OUTPUT]
Name cloudwatch_logs
Match kube.*
region us-east-1
log_group_name my-logs
log_stream_prefix from-fluent-bit-
log_retention_days 60
auto_create_group true
parsers.conf: |
[PARSER]
Name crio
Format Regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z

Modify the yaml to set region to the AWS Region that your cluster is in, and my-logs to a better log group name. Once created, use kubectl get configmap app-config -o yaml to see the details.

Run a new pod:

kubectl run httpd --image httpd:latest -n my-namespace --labels type=fargate

Doing kubectl describe pod httpd -n my-namespace shows that logging is enabled:

Annotations:          CapacityProvisioned: 0.25vCPU 0.5GB
Logging: LoggingEnabled

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LoggingEnabled 68s fargate-scheduler Successfully enabled logging for pod

At the CloudWatch console you'll find a new log group named my-logs.

Load Balancer Controller

From https://kubectl.docs.kubernetes.io/guides/introduction/resources_controllers/#service-discovery-and-load-balancing

  • Services Resources (L4) may expose Pods internally within a cluster or externally through an HA proxy.
  • Ingress Resources (L7) may expose URI endpoints and route them to Services.

https://github.com/kubernetes-sigs/aws-load-balancer-controller

Route internet traffic with AWS Load Balancer Controller - https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html

The Load Balancer Controller manages Elastic Load Balancers for a Kubernetes cluster. The controller continuously monitors the Kubernetes API for objects of type Ingress (for ALB) and Service of type LoadBalancer (for NLB). It provisions AWS load balancers that point to cluster Service or Ingress resources. In other words, the controller creates a single IP address or DNS name that points to multiple pods in your cluster.

Correspondence:

Note that at the cluster role we have the policy AmazonEKSClusterPolicy, which has many elasticloadbalancing permissions like elasticloadbalancing:CreateLoadBalancer. This allows EKS to create the load balancers.

Install using Helm

Instructions:

Steps:

  1. Create IAM permissions policy AWSLoadBalancerControllerIAMPolicy.
  2. Create the IAM role that uses the IAM policy.
  3. Create the Kubernetes service account and annotate it with the IAM role ARN, so that the service account assumes the IAM role.
  4. Install the AWS Load Balancer Controller using Helm.
    • Apply the CRDs if updating.

The policy AWSLoadBalancerControllerIAMPolicy can be reused across multiple EKS clusters in the same AWS account. If you already have it, skip next steps.

Go to the IAM console → Policies and click "Create policy". Switch to the JSON tab and paste this IAM policy: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.14.1/docs/install/iam_policy.json. Name it AWSLoadBalancerControllerIAMPolicy.

You can also do:

curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.14.1/docs/install/iam_policy.json
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam_policy.json

Make sure you use the latest tag, see releases.

If the cluster is new and you don't have an OIDC identity provider yet, create one following this instructions.

Next, create an IAM role to be used by the Kubernetes service account. Go to the IAM console → Roles and click "Create role". Set:

  • Trusted entity type: Custom trust policy.
  • Paste this trust policy (trusted entities), replacing <account-id> and <oidc-provider> (oidc.eks.<region>.amazonaws.com/id/<id>):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<account-id>:oidc-provider/<oidc-provider>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<oidc-provider>:aud": "sts.amazonaws.com",
"<oidc-provider>:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
}
}
}
]
}

You can also choose Trusted entity type "Web identity" and select the OIDC IdP for your cluster from the list, but then you need to add the condition sub to the trust policy, for example by editing the trust policy JSON when the role is created.

The sub condition ensures that only the service account aws-load-balancer-controller in the kube-system namespace can assume this role. The service account name (aws-load-balancer-controller) can be anything, but it must match the value used when deploying the controller.

At the "Add permissions" page, attach the permissions policy AWSLoadBalancerControllerIAMPolicy created before. Name the role AmazonEKSLoadBalancerControllerRole-<cluster-name>.

Now that the IAM role is created, create the Kubernetes service account that uses this IAM role:

kubectl create serviceaccount aws-load-balancer-controller -n kube-system

To tell Kubernetes which IAM role should the service account use, annotate the service account with the ARN of the IAM role created before:

kubectl annotate serviceaccount aws-load-balancer-controller -n kube-system \
eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/AmazonEKSLoadBalancerControllerRole-<cluster-name>

See the annotation with:

kubectl get serviceaccount aws-load-balancer-controller -n kube-system -o yaml
kubectl describe serviceaccount aws-load-balancer-controller -n kube-system

Finally, install the AWS Load Balancer Controller using Helm. See details at:

helm repo add eks https://aws.github.io/eks-charts
helm repo update eks
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=<my-cluster> \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller \
--set region=<us-east-1> \
--set vpcId=<vpc-id>

Verify the installation with:

helm list -A
# NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
# aws-load-balancer-controller kube-system 1 2025-11-11 20:03:00.575674 +0100 CET deployed aws-load-balancer-controller-1.14.1 v2.14.1

kubectl get ingressclass
# NAME CONTROLLER PARAMETERS AGE
# alb ingress.k8s.aws/alb <none> 16m

kubectl get deployment -n kube-system aws-load-balancer-controller
# NAME READY UP-TO-DATE AVAILABLE AGE
# aws-load-balancer-controller 2/2 2 2 72m

kubectl get pods -n kube-system | grep aws-load-balancer-controller
# aws-load-balancer-controller-65dcc7d589-tt6sr 1/1 Running 0 76m
# aws-load-balancer-controller-65dcc7d589-xtkpt 1/1 Running 0 76m
important

The deployed chart doesn’t receive security updates automatically. You need to manually upgrade to a newer chart when it becomes available. When upgrading, change install to upgrade in the previous command.

The helm install command automatically installs the custom resource definitions (CRDs) for the controller, but helm upgrade does not. When using helm upgrade, you must manually install the CRDs with:

wget https://raw.githubusercontent.com/aws/eks-charts/master/stable/aws-load-balancer-controller/crds/crds.yaml
kubectl apply -f crds.yaml

The two CRDs are ingressclassparams.elbv2.k8s.aws and targetgroupbindings.elbv2.k8s.aws. See them with kubectl get crd.

Classic Load Balancer - Service

Run this command to expose a deployment as a service and create a load balancer:

kubectl expose deployment <deployment> -n <namespace> --name myapp-service --port 8080 --type LoadBalancer
# service/myapp-service exposed

By default, creating this service creates a Classic Load Balancer with an "Internet-facing" scheme. A the EC2 console → Load Balancers, each EC2 instance registered at the "Target instances" tab are the EC2 nodes of the cluster. The CLB listens on TCP:8080 and forwards the request to registered EC2 instances using the instance protocol and port (eg TCP:30709) configured at the listener.

You can open the "DNS name" at the browser, for example http://ad4003564c7424c7a8991d29de4be1a7-2108949974.us-east-1.elb.amazonaws.com:8080/.

If you get this error when doing kubectl describe service myapp-service:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 28s (x4 over 63s) service-controller Ensuring load balancer
Warning SyncLoadBalancerFailed 27s (x4 over 63s) service-controller Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB

You need to tag subnets to tell EKS what subnets to use to deploy the load balancer. Add a tag to each subnet with the name kubernetes.io/cluster/<cluster-name> and value shared. See https://stackoverflow.com/questions/62468996/eks-could-not-find-any-suitable-subnets-for-creating-the-elb and https://repost.aws/knowledge-center/eks-vpc-subnet-discovery.

Deleting the service (kubectl delete service myapp-service -n <namespace>) also deletes the load balancer.

Network Load Balancer - Service

Route TCP and UDP traffic with Network Load Balancers - https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html

Auto Mode - Use Service Annotations to configure Network Load Balancers - https://docs.aws.amazon.com/eks/latest/userguide/auto-configure-nlb.html

By default, it creates a Classic Load Balancer. To create a Network Load Balancer instead, add this annotation to the Service manifest:

apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb

Scheme can be internal (default) or internet-facing. To create an Internet-facing NLB, add this annotation:

metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing

Target type can be instance (default) or ip. You can customize it using this annotation:

metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip

With target type ip, the NLB routes traffic directly to the pod IP addresses, instead of the node IP addresses. Network plugin must use native AWS VPC networking configuration for pod IP, for example Amazon VPC CNI plugin or an alternative CNI plugin.

Application Load Balancer - Ingress

Route application and HTTP traffic with Application Load Balancers - https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

ALB with Auto Mode - https://docs.aws.amazon.com/eks/latest/userguide/auto-configure-alb.html - https://docs.aws.amazon.com/eks/latest/userguide/auto-elb-example.html

You can define rules to route requests to different services based on URL paths and hostnames. For example, requests to /api can be routed to one service, while requests to /web go to another.

It can route traffic to pods running on EC2 and Fargate.

You can deploy an ALB to public or private subnets (source). Use public subnets to create an Internet-facing ALB, and private subnets to create an internal ALB.

Ingress specification - https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/spec/

Example: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/v2.14.1/docs/examples/2048/2048_full.yaml

How it works:

  1. The Load Balancer Controller watches the Kubernetes API for new Ingress resources with the annotation kubernetes.io/ingress.class: alb or ingressClassName: alb. When a new Ingress is created or updated, the controller reacts accordingly.
  2. Provisions a new Application Load Balancer (ALB) automatically. Based on the Ingress specification, it calls the AWS Elastic Load Balancing API to create or modify an ALB in your AWS account. It can reuse existing ALBs if configured (via Ingress annotations), or spin up new ones.
  3. Configures listeners. The ALB typically has two listeners on port 80 (HTTP) and port 443 (HTTPS, with ACM certificate). The controller can automatically handle HTTPS by attaching certificates from AWS Certificate Manager (ACM).
  4. For each backend service declared at the Ingress, the controller creates a Target Group, which is responsible for routing requests to the appropriate targets and health checking. The target group contains the pod IPs of your Kubernetes Service endpoints. It updates targets dynamically as pods scale up/down.
  5. Sets up routing rules (host/path-based). It translates the rules section of your Ingress spec (hostnames and paths) into ALB listener rules.

Traffic modes (target types):

  • alb.ingress.kubernetes.io/target-type: instance. The default. Registers nodes as targets for the ALB. ALB routes traffic to your service NodePort and then is proxied to node IPs. The nodes then forward traffic to pods.
  • alb.ingress.kubernetes.io/target-type: ip. Registers pods as targets for the ALB. ALB routes traffic directly to pod IPs. Service NodePort is bypassed. More efficient. Requires VPC CNI plugin or compatible CNI. Required for Fargate or EKS Hybrid Nodes.

You can add tags to the ALB with alb.ingress.kubernetes.io/tags: Environment=dev,Team=test.

Setup:

  1. Install the AWS Load Balancer Controller using Helm following these instructions.
  2. Deploy a sample application with a Service of type ClusterIP.
  3. Deploy an Ingress resource that points to the Service.

Deploy an application. For example, create a deployment and service:

alb-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: hashicorp/http-echo:1.0.0
args:
- '-text=Hello from my-app'
ports:
- containerPort: 5678
name: http
---
apiVersion: v1
kind: Service
metadata:
name: my-service
namespace: my-namespace
labels:
app: my-app
spec:
type: ClusterIP
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: http

Deploy an Ingress resource with the necessary annotations, rules and backend services. For example:

alb-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
namespace: my-namespace
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/subnets: subnet-07a3aa527ccdabc36,subnet-05bc3627ccda7a3aa
alb.ingress.kubernetes.io/target-type: ip
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80

Note: don't use kubernetes.io/ingress.class: alb because is deprecated, use ingressClassName: alb.

After creating it with kubectl apply -f alb-ingress.yaml, check the status with kubectl get ingress -n my-namespace:

NAME         CLASS   HOSTS   ADDRESS                                                                   PORTS   AGE
my-ingress alb * k8s-albingre-demoalbi-2062cf2532-1738990837.us-east-1.elb.amazonaws.com 80 14m

The ADDRESS column shows the DNS name of the ALB created. The HOSTS column * means it accepts requests for any hostname.

At the EC2 console → Load Balancers, you can see the ALB. The ALB was automatically created by the AWS Load Balancer Controller after deploying the Ingress resource with the appropriate annotations.

The Target Groups page shows the target group created for the backend service, with the pod IPs registered as targets (use kubectl get pods -o wide to see the pod IPs). We can scale the deployment with kubectl scale deployment my-app --replicas 3 and see the target group updated automatically, with the new pod IP added as a target.

Delete the ingress (kubectl delete ingress my-ingress -n my-namespace or kubectl delete -f alb-ingress.yaml) to delete the ALB and target group. Delete the deployment and service with kubectl delete -f alb-sample-app.yaml.

Ingress with Terraform

https://github.com/aws-samples/eks-workshop-v2/tree/stable/manifests/modules/exposing/ingress/.workshop/terraform

Volumes

https://docs.aws.amazon.com/eks/latest/userguide/storage.html

You can use S3, EBS, EFS, FSX and Amazon File Cache using Container Storage Interface (CSI) drivers. CSI drivers allow you to expose storage systems to your Kubernetes cluster as persistent volumes. Each CSI driver is an add-on that you need to install.

CSI driver needs AWS IAM Permissions. You can use IRSA or EKS Pod Identities.

FeatureEBSEFS
TypeBlockFile
Pod attachmentSingleMultiple
Access modeReadWriteOnceReadWriteMany
Fargate supportNoYes
StorageProvisioned upfrontScales automatically
PerformanceHigh IOPS and throughputScalable performance

EBS CSI driver with IRSA

https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html

https://github.com/kubernetes-sigs/aws-ebs-csi-driver

Note that you can’t mount Amazon EBS volumes to Fargate Pods, only to EC2 worker nodes.

Steps:

  1. Create the Identity Provider for your cluster.
  2. Create an IAM role for the EBS CSI driver with the permissions policy AmazonEBSCSIDriverPolicy.
  3. Install the EBS CSI Driver add-on at the cluster. This creates the service accounts used by the driver.
  4. Annotate the service account to link it to the IAM role.
  5. Create a PersistentVolumeClaim (PVC) that uses an EBS volume as storage for your pods.
  6. Create a pod that uses the PVC.

If you haven't already, create an Identity provider. See steps above at IRSA.

Then create the role that the CSI driver will use. Go to the IAM console → Roles and click "Create role". Set:

  • Trusted entity type: Custom trust policy.
  • Paste this trust policy, replacing <account-id> and <oidc-provider>:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<account-id>:oidc-provider/<oidc-provider>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<oidc-provider>:aud": "sts.amazonaws.com",
"<oidc-provider>:sub": "system:serviceaccount:kube-system:ebs-csi-controller-sa"
}
}
}
]
}

The <oidc-provider> is oidc.eks.<region>.amazonaws.com/id/<id>. You can get it at the console, at the cluster Overview tab → Details section → OpenID Connect provider URL (remove https://), or by running aws eks describe-cluster --name MyCluster --region us-east-1 --query cluster.identity.oidc.issuer --output text | sed -e "s/^https:\/\///".

At the next page, attach the permissions policy AmazonEBSCSIDriverPolicy to the role, which "allows the CSI driver service account to make calls to related services such as EC2 on your behalf". Name the role AmazonEKS_EBS_CSI_DriverRole as suggested in the docs.

Install the EBS CSI Driver add-on. At the EKS console, go to your cluster → "Add-ons" tab → click "Get more add-ons". Select "Amazon EBS CSI Driver". Select the latest version. For "Add-on access", choose "IAM roles for service accounts (IRSA)". Select the role AmazonEKS_EBS_CSI_DriverRole just created.

After installing the add-on, run kubectl get serviceaccounts -n kube-system | grep ebs to see the service account used by the driver, which is created automatically when installing the add-on. This returns ebs-csi-controller-sa and ebs-csi-node-sa.

We need to annotate the service account to link it to the IAM role. Run:

kubectl annotate serviceaccount ebs-csi-controller-sa -n kube-system eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/AmazonEKS_EBS_CSI_DriverRole

To see the annotation, run kubectl get serviceaccount ebs-csi-controller-sa -n kube-system -o yaml or kubectl describe serviceaccount ebs-csi-controller-sa -n kube-system.

This setup allows the service account used by the EBS CSI driver to assume the IAM role with the necessary permissions to manage EBS volumes on your behalf.

Once the driver is installed and the service account is annotated, define a PersistentVolumeClaim (PVC) that uses an EBS volume as storage for your pods:

ebs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-pvc
spec:
storageClassName: gp2
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

EBS volumes can only be attached (mounted) to a single node at a time, so we set accessModes: ReadWriteOnce.

We set storageClassName: gp2, which is the default StorageClass created by the EBS CSI driver add-on. You can check it with kubectl get storageclass or kubectl get storageclasses.storage.k8s.io:

NAME   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 7d17h

The reclaim policy Delete means that when the PVC is deleted, the underlying EBS volume is also deleted. The other option is Retain, which keeps the volume even after the PVC is deleted.

The WaitForFirstConsumer volume binding mode ensures that the volume is not created until a pod that uses the PVC is scheduled. This is important for EBS volumes because they need to be created in the same availability zone as the node where the pod will run. We also avoid provisioning unnecessary volumes, to save costs (in EBS you pay for provisioned storage, even if unused).

Create the PVC with kubectl apply -f ebs-pvc.yaml. Check it with kubectl get pvc or kubectl describe pvc ebs-pvc.

Name:          ebs-pvc
Namespace: my-namespace
StorageClass: gp2
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 9s (x8 over 110s) persistentvolume-controller waiting for first consumer to be created before binding

Note that it says "waiting for first consumer to be created before binding". This is because no pod is using the PVC yet.

Create this pod manifest that uses the PVC:

ebs-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: ebs-app
namespace: my-namespace
spec:
volumes:
- name: ebs-volume
persistentVolumeClaim:
claimName: ebs-pvc
containers:
- name: app
volumeMounts:
- name: ebs-volume
mountPath: /opt
image: ubuntu:24.04
command:
- sh
- -c
- while date; do echo "Hi at `date`" >> /opt/demo.out; sleep 30; done

The claimName needs to match the PVC name.

Create the pod that uses the persistent volume claim with kubectl apply -f ebs-pod.yaml. If we do kubectl get pvc ebs-pvc we see that the PVC status changes from Pending to Bound:

NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
ebs-pvc Bound pvc-c223eab5-bad5-458a-af4b-0d16ecacd589 1Gi RWO gp2 <unset> 107m

Now you can run kubectl get pv to see the created persistent volume that uses an EBS volume:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                  STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-c223eab5-bad5-458a-af4b-0d16ecacd589 1Gi RWO Delete Bound my-namespace/ebs-pvc gp2 <unset> 25m

At the EC2 console → Elastic Block Store → Volumes, you see a new EBS volume created. From there you can go to the EC2 instance, by clicking the instance id at "Attached resources". At the instance Storage tab, you can see the root volume (20 GiB) and the new EBS volume (1 GiB) attached.

You can also run this command to see the EBS volume details:

aws ec2 describe-volumes --filters Name=tag:kubernetes.io/created-for/pvc/name,Values=ebs-pvc

To clean up, delete the pod and the PVC. Just deleting the pod leaves the PVC and the EBS volume intact. Because the PV reclaim policy is Delete, the EBS volume is deleted when the PVC is deleted.

kubectl delete -f ebs-pod.yaml

kubectl get pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
# ebs-pvc Bound pvc-c223eab5-bad5-458a-af4b-0d16ecacd589 1Gi RWO gp2 <unset> 175m
kubectl get pv
# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
# pvc-c223eab5-bad5-458a-af4b-0d16ecacd589 1Gi RWO Delete Bound my-namespace/ebs-pvc gp2 <unset> 96m

kubectl delete -f ebs-pvc.yaml

kubectl get pvc
# No resources found in my-namespace namespace.
kubectl get pv
# No resources found

Verify that the EBS volume is deleted with (response should be an empty array):

aws ec2 describe-volumes --filters Name=tag:kubernetes.io/created-for/pvc/name,Values=ebs-pvc

Volume Claim Templates

In the previous section EBS CSI driver with IRSA we show how to manually create a PersistentVolumeClaim (PVC) to use EBS storage for a single pod. In this section we see how to use VolumeClaimTemplates to request dynamic storage for multiple pods in a StatefulSet.

With StatefulSets, each pod has its own persistent storage. You can use VolumeClaimTemplates to request dynamic storage for each pod in the StatefulSet. Each pod gets its own PersistentVolumeClaim (PVC) based on the template, allowing for unique storage per pod. Even if the pod is deleted and recreated (rescheduled), it gets the same PVC and retains its data.

Stateful set manifest example:

ebs-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: nginx
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
storageClassName: gp2
accessModes: ['ReadWriteOnce']
resources:
requests:
storage: 1Gi

Instead of creating a PVC manually and then referencing it at the pod manifest, we define a volumeClaimTemplates section in the StatefulSet spec. Each pod in the StatefulSet gets its own PVC based on this template. Each PVC will create a PV and provision an EBS volume.

Note that the volumeClaimTemplates spec is the same than a regular PVC spec.

Create the StatefulSet with kubectl apply -f ebs-statefulset.yaml. This creates 1 pod, 1 PVC and 1 PV. Use kubectl get pod,pvc,pv to see them. Use aws ec2 describe-volumes --filters Name=tag:kubernetes.io/created-for/pvc/name,Values="www-web-*" to see the created EBS volume.

Scale the StatefulSet to 2 replicas with kubectl scale statefulset web --replicas 2. This creates one more pod, which dynamically creates a PVC, which in turns provisions an EBS volume for that pod using the CSI driver. Each pod has its own volume. The volume is attached to the node where the pod is running.

Delete the StatefulSet set with kubectl delete -f ebs-statefulset.yaml. We are using persistent volumes, so deleting the StatefulSet deletes the pods but does not delete the PVCs, PVs and EBS volumes (that's the point of persistent volumes). However, note that after deleting the pods the state of EBS volumes is now "Available" at the EC2 console, because the volumes are no longer attached to a node. At the EC2 instance Storage tab, the EBS volumes are gone. You need to delete the PVCs manually with kubectl delete pvc <pvc-name> or kubectl delete pvc --all. Deleting the PVC deletes the PV and the underlying EBS volume, because the PV reclaim policy is Delete.

EFS CSI driver with Pod Identity

https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html

https://github.com/kubernetes-sigs/aws-efs-csi-driver

EFS nodes can be shared across multiple pods simultaneously. We can set accessModes: ReadWriteMany in the PVC.

Instead of using IRSA, we use Pod Identity to grant the necessary IAM permissions to the EFS CSI driver.

Unlike EBS, which can only be used in node groups, EFS can be mounted to Fargate pods.

Unlike EBS, in EFS volume binding is immediate, so the volume is created as soon as the PVC is created, not when the pod is scheduled.

Steps:

  1. Create the IAM role with the AmazonEFSCSIDriverPolicy policy.
  2. Install the EFS CSI Driver add-on at the cluster.
    • This creates the service accounts used by the driver.
  3. Use Pod Identity to bind the role to the service account.
    • If using the console to install the add-on, this step is done automatically when installing the CSI driver, when selecting "EKS Pod Identity" as "Add-on access". Check the "Access" tab at the cluster, under "Pod Identity associations".
  4. Create the EFS file system.
    • This is an important difference with EBS, where the volume is created dynamically when the PVC is created. With EFS we need to create the file system beforehand.
  5. Define a StorageClass that references the EFS file system.
  6. Create a PersistentVolume (PV) that uses the StorageClass and ReadWriteMany access mode.
  7. Create a PersistentVolumeClaim (PVC) that uses the StorageClass and ReadWriteMany access mode.
  8. Create a pod that uses the PVC.

Create the role that the CSI driver will use. Go to the IAM console → Roles and click "Create role". Set:

  • Trusted entity type: AWS service.
  • Service or use case: EKS - Pod Identity. Allows pods running in Amazon EKS cluster to access AWS resources.

Attach the AWS managed policy AmazonEFSCSIDriverPolicy, which "provides management access to EFS resources and read access to EC2". Name the role AmazonEKS_EFS_CSI_DriverRole as suggested in the docs.

At the cluster, install the EFS CSI Driver add-on. At the EKS console, go to your cluster → "Add-ons" tab → click "Get more add-ons". Select "Amazon EFS CSI Driver". Select the latest version. For "Add-on access", choose "EKS Pod Identity". Select the role AmazonEKS_EFS_CSI_DriverRole just created.

After installing the driver, run kubectl get serviceaccounts -n kube-system | grep efs to see the service account used by the driver, which is created automatically when installing the add-on. This returns efs-csi-controller-sa and efs-csi-node-sa.

There is an important difference with EBS. In EBS the volumes are automatically created when the pods are scheduled with a PVC, but in EFS we need to create the file system beforehand, it won't be created automatically.

You need to create the EFS file system in the same VPC as your EKS cluster (you can also use VPC peering). The EC2 nodes and the subnets used in the Fargate profile need to have network access to the EFS mount targets. You need to create a mount target for each subnet that your nodes are in. The security group used by the EFS mount targets must allow inbound NFS traffic (TCP port 2049) from the CIDR block of the cluster's VPC.

You can create the EFS filesystem at the EFS console, or using the AWS CLI, following these steps: https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md

Check the current storage classes with kubectl get storageclass or kubectl get storageclasses.storage.k8s.io. There should be none related to EFS. Output is:

NAME   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 5h1m

Create a StorageClass that references the EFS file system:

efs-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc
provisioner: efs.csi.aws.com

Create the StorageClass with kubectl apply -f efs-storageclass.yaml. Now there should be two storage classes:

NAME     PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
efs-sc efs.csi.aws.com Delete Immediate false 6s
gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 5h3m

The volume binding mode is Immediate, so the volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created. In contrast, WaitForFirstConsumer delays the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created.

Create a PersistentVolume:

efs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc # StorageClass created earlier
csi:
driver: efs.csi.aws.com
volumeHandle: fs-02dc12e5f3d8805eb # Replace with your EFS file system ID

With ReadWriteMany, multiple nodes can access the volume simultaneously. With Retain, the PV and underlying EFS file system are retained when the PVC is deleted.

Create the PV with kubectl apply -f efs-pv.yaml. Check it with kubectl get pv or kubectl describe pv efs-pv. The status is Available:

NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
efs-pv 5Gi RWX Retain Available efs-sc <unset> 6s

Create a PersistentVolumeClaim (PVC) that uses the StorageClass:

efs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 5Gi

Create the PVC with kubectl apply -f efs-pvc.yaml. Check it with kubectl get pvc or kubectl describe pvc efs-pvc. Now the status of the PV changes to Bound, since the PVC is bound to it:

kubectl get pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
# efs-pvc Bound efs-pv 5Gi RWX efs-sc <unset> 6s

kubectl get pv
# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
# efs-pv 5Gi RWX Retain Bound demo/efs-pvc efs-sc <unset> 4m30s

Create a Deployment to test that the EFS volume can be mounted on multiple pods:

efs-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: efs-app
spec:
replicas: 3
selector:
matchLabels:
app: efs-app
template:
metadata:
labels:
app: efs-app
spec:
containers:
- name: app
image: busybox
command: ['/bin/sh']
args:
[
'-c',
'while true; do echo Hello at `date` from `hostname` >> /data/out1.txt; sleep 5; done',
]
volumeMounts:
- name: efs-volume
mountPath: /data
volumes:
- name: efs-volume
persistentVolumeClaim:
claimName: efs-pvc

Create the Deployment with kubectl apply -f efs-deployment.yaml. Check the pods with kubectl get pods. Once the pods are running, shell into one of them and check that the file is being written:

kubectl exec -it <pod-name> -- /bin/sh
cat /data/out1.txt
# Hello at Tue Nov 11 11:07:30 UTC 2025 from efs-app-7b9b97ccbd-99jc5
# Hello at Tue Nov 11 11:07:31 UTC 2025 from efs-app-7b9b97ccbd-7nb2x
# Hello at Tue Nov 11 11:07:31 UTC 2025 from efs-app-7b9b97ccbd-2chds
# ...
df -ah /data
# Filesystem Size Used Available Use% Mounted on
# 127.0.0.1:/ 8.0E 0 8.0E 0% /data

We can also launch pods on Fargate that use the same EFS PVC. The EFS file system can be shared between EC2 nodes and Fargate pods simultaneously.

To clean up, delete the Deployment, the PVC, the PV and the EFS file system (the file system is not deleted automatically):

kubectl delete -f efs-deployment.yaml
kubectl delete -f efs-pvc.yaml # Or kubectl delete pvc efs-pvc

# Deleting the PVC changes the PV status from "Bound" to "Released", but the PV is not
# deleted because the reclaim policy is "Retain".
kubectl get pv
# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
# efs-pv 5Gi RWX Retain Released demo/efs-pvc efs-sc <unset> 39m

kubectl delete -f efs-pv.yaml # Or kubectl delete pv efs-pv
# Deleting the PV does not delete the underlying EFS file system nor the StorageClass.
# You can delete the StorageClass with `kubectl delete -f efs-storageclass.yaml` or `kubectl delete storageclass efs-sc`.

Finally, delete the EFS file system at the EFS console. If using the CLI, you need to delete the mount targets first (otherwise you get the error FileSystemInUse).

Autoscaling

https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

https://www.eksworkshop.com/docs/fundamentals/workloads/

https://kubernetes.io/docs/concepts/cluster-administration/node-autoscaling/

Cluster Autoscaler

https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

https://docs.aws.amazon.com/eks/latest/best-practices/cas.html

Cluster Autoscaler on AWS - https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

When you create a node group, it creates an Auto Scaling group (and its corresponding launch template). The node group has a desired, minimum, and maximum size. The Auto Scaling group simply maintains the number of EC2 nodes in the group based on the desired size, replacing any unhealthy instances, but it doesn't do any scaling based on workload, because there are no scaling policies in the Auto Scaling group.

We want to scale up the number of nodes when new pods are scheduled and there are not enough resources in the cluster, and scale down the number of nodes when there are idle nodes. This is done with the Kubernetes Cluster Autoscaler, which adjusts the desired size of the Auto Scaling group.

Nodes are not terminated abruptly. The Cluster Autoscaler first cordons and drains the node, evicting the pods running on it by rescheduling them to other nodes. Once the node is empty, it is terminated.

The Cluster Autoscaler runs as a Deployment in the kube-system namespace. Only 1 replica (ie one pod) runs. It needs an IAM role with the necessary permissions to manage the Auto Scaling groups. You can use IRSA to assign the role to the Cluster Autoscaler service account.

If you are using the Kubernetes Cluster Autoscaler and running stateful pods, you should create one Node Group for each availability zone using a single subnet and enable the --balance-similar-node-groups feature in cluster autoscaler. (From the console Info sidebar.)

Terraform - https://github.com/aws-samples/eks-workshop-v2/tree/stable/manifests/modules/autoscaling/compute/cluster-autoscaler/.workshop/terraform

Setup Cluster Autoscaler with IRSA and Auto-Discovery

Resources:

Auto-Discovery Setup is the preferred method to configure Cluster Autoscaler.

The Auto Scaling group has 2 tags that allow the Cluster Autoscaler to discover and manage it:

  • Key: k8s.io/cluster-autoscaler/enabled - Value: true
  • Key: k8s.io/cluster-autoscaler/<cluster-name> - Value: owned

You use the CLI flag --node-group-auto-discovery to set these tags. Note that the value will be ignored, only the tag name matters.

Create the permissions policy. At the IAM console → Policies, click "Create policy". Switch to the JSON tab and paste the policy from https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#full-cluster-autoscaler-features-policy-recommended, replacing <asg-arn> and <my-cluster>:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": ["<asg-arn>"],
"Condition": {
"StringEquals": {
"aws:ResourceTag/k8s.io/cluster-autoscaler/enabled": "true",
"aws:ResourceTag/k8s.io/cluster-autoscaler/<my-cluster>": "owned"
}
}
}
]
}

Important: I've modified the second "Resource": ["*"] by setting the ASG ARN to restrict access to only a specific Auto Scaling group, as suggested in the docs (only the second block of actions should be updated to restrict the resources/add conditionals). I've also added the Condition block, as shown at https://docs.aws.amazon.com/eks/latest/best-practices/cas.html, which prevents a Cluster Autoscaler running in one cluster from modifying nodegroups in a different cluster even if the --node-group-auto-discovery argument wasn’t scoped down to the nodegroups of the cluster using tags.

Name the permissions policy AmazonEKS_ClusterAutoscalerPolicy-<cluster-name>.

Create a Role with "Trusted entity type" "Custom trust policy". Set this trust policy, replacing <account-id> and <oidc-provider> (oidc.eks.<region>.amazonaws.com/id/<id>):

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<account-id>:oidc-provider/<oidc-provider>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<oidc-provider>:aud": "sts.amazonaws.com",
"<oidc-provider>:sub": "system:serviceaccount:kube-system:cluster-autoscaler"
}
}
}
]
}

The service account name (cluster-autoscaler) in the policy needs to match the value in the Kubernetes manifest. Attach the permissions policy just created (AmazonEKS_ClusterAutoscalerPolicy-<cluster-name>) to the role. Name the role AmazonEKS_ClusterAutoscalerRole-<cluster-name>.

To deploy the Cluster Autoscaler, download the file from https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml, then make the following changes:

  1. At the Deployment, at the --node-group-auto-discovery command line flag, replace <YOUR CLUSTER NAME> for the tag k8s.io/cluster-autoscaler/<cluster-name>.
  2. At the ServiceAccount, add this annotation so that the service account uses the IAM role just created:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/AmazonEKS_ClusterAutoscalerRole-<cluster-name>

Deploy the Cluster Autoscaler with kubectl apply -f cluster-autoscaler-autodiscover.yaml.

  • View the pod with kubectl get pods -n kube-system | grep cluster-autoscaler.
  • Inspect the pod with kubectl describe pod -n kube-system cluster-autoscaler-<xyz>.
  • Check the logs with kubectl logs -n kube-system deployment/cluster-autoscaler or kubectl logs -n kube-system cluster-autoscaler-<xyz>.

To test the Cluster Autoscaler, create a deployment and then scale it to many pods that request enough resources to require more nodes than currently available. The Cluster Autoscaler will then scale up the Auto Scaling group by increasing the desired capacity, launching new EC2 nodes. Once the new nodes are ready, the pending pods will be scheduled.

cluster-autoscaler-test-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler-test
labels:
app: cluster-autoscaler-test
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler-test
template:
metadata:
labels:
app: cluster-autoscaler-test
spec:
containers:
- name: httpd
image: httpd
resources:
limits:
memory: 512Mi
cpu: 500m

Do the following:

  • Check the current number of nodes and pods with kubectl get nodes and kubectl get pods.
  • Open two terminal windows and watch new nodes and pods being created with kubectl get nodes -w and kubectl get pods -w.
    • You can also wait for all nodes and pods to have status Ready with kubectl wait --for=condition=Ready nodes --all and kubectl wait --for=condition=Ready pods --all.
    • You can also use the watch command: watch kubectl get nodes and watch kubectl get pods.
  • Apply the deployment with kubectl apply -f cluster-autoscaler-test-deployment.yaml.
  • Scale up the deployment to 5 or 10 replicas with kubectl scale deployment cluster-autoscaler-test --replicas=5.
  • Check the Cluster Autoscaler logs to see the scaling activity with kubectl logs [--follow] -n kube-system deployment/cluster-autoscaler. You can see logs like "Scale up in group eks-My-EKS-Node-Group-d6cd375e-e065-401f-769b-58cdc5869a47 finished successfully in 50.165152427s".

Initially the new pods will be in Pending state because there are not enough resources. When new nodes are ready (its status changes from NotReady to Ready), the pod state changes from Pending to ContainerCreating and then Running.

Doing kubectl describe pod cluster-autoscaler-test-<xyz> shows the TriggeredScaleUp event due to "Insufficient memory":

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m53s default-scheduler 0/4 nodes are available: 1 Too many pods, 4 Insufficient memory. no new claims to deallocate, preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
Warning FailedScheduling 2m19s default-scheduler 0/5 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 4 Insufficient memory. no new claims to deallocate, preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.
Warning FailedScheduling 2m18s default-scheduler 0/5 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 4 Insufficient memory. no new claims to deallocate, preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.
Warning FailedScheduling 2m14s default-scheduler 0/5 nodes are available: 1 Too many pods, 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 4 Insufficient memory. no new claims to deallocate, preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod.
Normal Scheduled 2m default-scheduler Successfully assigned demo/cluster-autoscaler-test-5cf8cdc587-jvzrj to ip-172-31-46-244.ec2.internal
Normal TriggeredScaleUp 2m48s cluster-autoscaler pod triggered scale-up: [{eks-My-EKS-Node-Group-d6cd375e-e065-401f-769b-58cdc5869a47 4->5 (max: 5)}]
Normal Pulling 119s kubelet Pulling image "httpd"

Test the scale down/in by scaling down the deployment to 1 replica with kubectl scale deployment cluster-autoscaler-test --replicas=1.

Pods are terminated immediately, in a few seconds, but nodes are not, it takes about 10 minutes. There is a cool down period before idle nodes are terminated, which can be configured with the --scale-down-unneeded-time flag.

Check the Cluster Autoscaler logs to see the scale down activity. You should see logs like "node ip-172-31-46-244.ec2.internal may be removed", "Starting scale down" and "Considering node ip-172-31-46-244.ec2.internal for standard scale down".

The node status changes from Ready to Ready,SchedulingDisabled and then NotReady,SchedulingDisabled. At the state Ready,SchedulingDisabled, the node is cordoned (which prevents new pods from being scheduled on it) and drained. Once all pods are evicted, the node is terminated.

Delete the deployment with kubectl delete -f cluster-autoscaler-test-deployment.yaml. Optionally delete the Cluster Autoscaler with kubectl delete -f cluster-autoscaler-autodiscover.yaml.

Horizontal Pod Autoscaler

https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

https://www.eksworkshop.com/docs/fundamentals/workloads/horizontal-pod-autoscaler/

Cluster Proportional Autoscaler

https://www.eksworkshop.com/docs/fundamentals/workloads/cluster-proportional-autoscaler/

Vertical Pod Autoscaler

https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html

Kubernetes Event-Driven Autoscaler (KEDA)

https://www.eksworkshop.com/docs/fundamentals/workloads/keda/

Amazon Managed Service for Prometheus

https://medium.com/@galazkaryan/help-i-deleted-an-amazon-managed-prometheus-workspace-am-still-being-charged-for-it-92f9effaecdc

https://repost.aws/questions/QUNH2lwf9xT9GbXEt4RGk7Cg/deciphering-aws-billing-understanding-charges-for-amazon-managed-service-for-prometheus

aws amp list-scrapers
aws amp delete-scraper --scraper-id <scraper-id>