Create AutoNode hosted clusters

HyperShift AutoNode (Powered by Karpenter) is a feature that runs Karpenter management side as a control plane component while it watches openshiftNodeClasses, nodePools.karpenter.sh, and nodeclaims.karpenter.sh resources in the guest cluster.

To create a hosted cluster with autoNode enabled the Hypershift Operator needs to be installed with the feature gate --tech-preview-no-upgrade=true.

The Hypershift controller creates and manages the default openshiftNodeClasses in the hosted cluster allowing you to deploy workloads based in your environment with NodePools(nodepools.karpenter.sh).

The following steps describes how to install OpenShift workload cluster on AWS with AutoNode feature by Karpenter.

Note: openshiftNodeClasses is exposed for API consumers.

Prerequisites

Install the latest and greatest HyperShift CLI.
- Make sure all prerequisites have been satisfied (Pull Secret, Hosted Zone, OIDC Bucket, etc)
Ensure that the AWS service-linked role for Spot is enabled in the account where the hosted cluster will be installed. This is a one-time setup per account.
- You can verify if the role already exists using the following command:
```
aws iam get-role --role-name AWSServiceRoleForEC2Spot
```
- If the role does not exist, create it with:
```
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com
```

Export environment variables, adjusting according to your setup:

# AWS config
export AWS_CREDS="$HOME/.aws/credentials"
export AWS_REGION=us-east-1

# OpenShift credentials and configuration
export CLUSTER_PREFIX=hcp-aws
export CLUSTER_BASE_DOMAIN=devcluster.openshift.com
export PULL_SECRET_FILE="${HOME}/.openshift/pull-secret-latest.json"
export SSH_PUB_KEY_FILE=$HOME/.ssh/id_rsa.pub

## S3 Bucket name hosting the OIDC discovery documents
# You must have set it up, see Getting Started for more information:
# https://hypershift-docs.netlify.app/getting-started/
export OIDC_BUCKET_NAME="${CLUSTER_PREFIX}-oidc"

Install the Hypershift Operator

This section describes hands on steps to install the Hypershift Operator with AutoNode feature by enabling the feature gate tech-preview-no-upgrade. See the following documents for more information:

Steps:

Install the operator:

./hypershift install \
  --oidc-storage-provider-s3-bucket-name="${OIDC_BUCKET_NAME}" \
  --oidc-storage-provider-s3-credentials="${AWS_CREDS}" \
  --oidc-storage-provider-s3-region="${AWS_REGION}" \
  --tech-preview-no-upgrade=true

Check if controller is running as expected:

oc get all -n hypershift

Create Workload Cluster with AutoNode

Create the workload cluster with HyperShift AutoNode.

Choose the desired target release image name (release controller).

Create a hosted cluster, enabling the flag --auto-node:

HOSTED_CLUSTER_NAME=${CLUSTER_PREFIX}-wl
OCP_RELEASE_IMAGE=<CHANGE_ME_TO_LATEST_RELEASE_IMAGE>
# Example of image: quay.io/openshift-release-dev/ocp-release:4.19.0-rc.5-x86_64

./hypershift create cluster aws \
  --name="${HOSTED_CLUSTER_NAME}" \
  --region="${AWS_REGION}" \
  --zones="${AWS_REGION}a" \
  --node-pool-replicas=1 \
  --base-domain="${CLUSTER_BASE_DOMAIN}" \
  --pull-secret="${PULL_SECRET_FILE}" \
  --aws-creds="${AWS_CREDS}" \
  --ssh-key="${SSH_PUB_KEY_FILE}" \
  --release-image="${OCP_RELEASE_IMAGE}" \
  --auto-node=true

Check the cluster information:

oc get --namespace clusters hostedclusters
oc get --namespace clusters nodepools

When completed, extract the credentials for workload cluster:

./hypershift create kubeconfig --name ${HOSTED_CLUSTER_NAME} > kubeconfig-${HOSTED_CLUSTER_NAME}

# kubeconfig for workload cluster
export KUBECONFIG=$PWD/kubeconfig-${HOSTED_CLUSTER_NAME}

Check the managed openshiftNodeClasses object created in the workload cluster:

oc get openshiftNodeClasses

Now you are ready to use AutoNode feature by setting the Karpenter configuration to fit your workloads.

Deploy Sample Workloads

This section provides examples to getting started exploring HyperShift AutoNode.

Using AutoNode with a Simple Web App

This example demonstrates how to use the AutoNode feature by creating a NodePool to fit the sample application. The sample application selects the instance type t3.large, matching the node.kubernetes.io/instance-type selector in the NodePool.

Create a Karpenter NodePool with the configuration for the workload:

cat << EOF | oc apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
 name: spot-and-gpu
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidateAfter: 30s
    consolidationPolicy: WhenEmptyOrUnderutilized
  weight: 10
  template:
    spec:
      expireAfter: 336h
      terminationGracePeriod: 24h0m0s
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot"]
      - key: karpenter.sh/instance-family
        operator: In
        values: ["g4dn", "m5", "m6i", "c5", "c6i", "t3"]
      - key: karpenter.sh/instance-size
        operator: In
        values: ["large", "xlarge", "2xlarge"]
EOF

Create a Sample App deployment:

This section demonstrates how to deploy sample applications to test and scale Karpenter's AutoNode feature. By creating workloads with specific resource requirements, you can observe how Karpenter provisions nodes dynamically to meet the demands of your applications.

cat << EOF | oc apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: web-app
spec:
 replicas: 0
 selector:
   matchLabels:
     app: web-app
 template:
   metadata:
     labels:
       app: web-app
   spec:
     affinity:
       podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
               matchLabels:
                 app: web-app
             topologyKey: "kubernetes.io/hostname"   
     securityContext:
       runAsUser: 1000
       runAsGroup: 3000
       fsGroup: 2000
     containers:
     - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
       name: web-app
       resources:
         requests:
           cpu: "1"
           memory: 256M
       securityContext:
         allowPrivilegeEscalation: false
     nodeSelector:
       node.kubernetes.io/instance-type: "t3.large"
EOF

Scale the application:

oc scale --replicas=1 deployment.apps/web-app

Monitor and Debug Node Provisioning with AutoNode:

Monitor the nodeClaims objects to track the provisioning of nodes by AutoNode. These objects provide detailed insights into the lifecycle of nodes, including their current state and any associated events. Use the following command to continuously watch the nodeClaims:

Provisioning Time

The provisioning process for an instance to become a node may take approximately 10 minutes. While waiting, monitor the progress using the provided commands to ensure the process completes successfully.

oc get nodeclaims --watch

To investigate a specific nodeClaim in detail, use the following command:

oc describe nodeclaim <nodeClaimName>

This will provide comprehensive information about the selected nodeClaim, helping you debug and confirm that nodes are being provisioned and functioning as expected.

Verify Node Join Status:

Ensure that the node has successfully joined the cluster. Use the following command to check:

oc get nodes -l karpenter.sh/nodepool=spot-and-gpu

This command filters the nodes associated with the spot-and-gpu NodePool, allowing you to confirm that the AutoNode feature is functioning as expected.

Verify Application Scheduling:

Ensure that the application has been successfully scheduled onto the newly provisioned node. Use the following command to check the status of the pods associated with the application:

oc get pods -l app=web-app -o wide

This command filters the pods by the label app=web-app, allowing you to confirm that the application is running on the expected node provisioned by AutoNode.