Installing Google Kubernetes Engine (GKE) On-Prem on VMware vSAN

Installing Google Kubernetes Engine (GKE) On-Prem on VMware vSAN

In the last three blogs of this series, we covered the overview and concepts of Anthos and GKE On-Prem(Part I and II). Now, its time for the rubber to meet the road. In this blog post, we walk through the actual installation steps for getting GKE On-Prem running inside your Datacenter on vSphere 6.5. For my lab environment, I am using a Lenovo ThinkAgile VX 4 – Node cluster running VMware vSAN. Let’s talk about pre-requisites.

  1. We start with a basic vSphere cluster with a vCenter instance managing all hosts, HA and DRS enabled and a Distributed Virtual Switch. You can also use a Standard Switch if you want.
  2. You will need two additional networks defined in your cluster in addition to the VM Network. These two networks (let’s call them Inside and Outside Network). The Inside Network will be used to assign IP addresses to our K8s nodes that will be deployed in our cluster, and the Outside Network will be used to assign IP addresses to any services that are exposed using the type LoadBalancer in K8S, or any VIPs defined for Ingress needs.
  3. I am using a vSAN Datastore in my lab because ThinkAgile VX is an HCI appliance which enables the extension of all Storage Policy Based Management capabilities to my containerized applications. But, you can use any shared vSphere datastore mounted to all the hosts in the cluster.
  4. You will also need to deploy an F5 BIG IP LTM virtual appliance in your cluster, with three interfaces: one in the VM Management Network, one in the outside network and the last in the Inside Network. Once you have done the basic setup for the F5 virtual appliance, create two User Partitions and namely admin-cluster and user-cluster. On initial setup, GKE On-Prem deploys one admin and one user cluster. When you want additional User clusters in your setup, you need to create additional user partitions on the F5 instance. There is a 1-1 mapping between F5 user partition and a Kubernetes cluster.
  5. Next, you need a VM that you can install Terraform, gcloud sdk and VMware govc libraries. I used an Ubuntu 18.04 VM running on the same vSphere cluster. I attached a single NIC to the VM in the VM Network.

Now that you have all the pre-reqs ready, we can start the actual installation of GKE On-Prem.

1. SSH into the Ubuntu VM that you deployed and then install the gcloud sdk, vmware govc and Terraform using the following commands:

  • Install Google Cloud SDK, which includes the gcloud cli.

#Add the Cloud SDK distribution URI as a package source

echo "deb [signed-by=/usr/share/keyrings/] cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

# Import the Google Cloud Platform public key

curl | sudo apt-key --keyring /usr/share/keyrings/ add -

# Update the package list and install the Cloud SDK

sudo apt-get update && sudo apt-get install google-cloud-sdk
  • Once you have the binaries installed, the next step is to run gcloud init and then log in using your Google Cloud account.
gcloud init

You must log in to continue. Would you like to log in (Y/n)?  Y (This will generate a long URL, which should look something like this. I have trimmed down my URL for example) - URL

Enter verification code:

Pick Google Cloud Project: [Select the project that you have created for GKE On-Prem]

Do you want to configure a default Compute Region and Zone? (Y/n)?  Y
1 for Us-east1-b [You can choose the location closest to you]
  • Once you have the gcloud cli working, the next step is to install the VMware govc library, so you can interact with your vCenter instance.
gzip -d govc_linux_amd64.gz
sudo chmod +x govc_linux_amd64
sudo mv govc_linux_amd64 /usr/local/bin/govc
  • Next step is to install Terraform v0.11. For some reason GKE On-Prem cannot work with the latest version of Terraform, so we will pull down the binaries for v0.11
sudo apt-get install unzip
sudo mv terraform /usr/local/bin
Terraform --version

2. Now, that you have all the binaries installed on your Ubuntu VM, next step is to create service accounts that will be used during the GKE On-Prem Installation process. We will create four service accounts:

  • access-sa: Gives access to the GKE On-Prem software
  • register-sa: Used by the Cloud Connect agent to register the user cluster to Google Cloud.
  • connect-sa: Used by the Cloud Connect agent for establishing a connection between GKE On-Prem and Google Cloud
  • stackdriver-sa: Used to collect logs and send them to Stackdriver.

Use the following commands to create these service accounts. Replace the values in [square brackets] with your project-specific values

gcloud iam service-accounts create [ACCESS_SERVICE_ACCOUNT_NAME] --project [PROJECT_ID]
gcloud iam service-accounts create [REGISTER_SERVICE_ACCOUNT_NAME] --project [PROJECT_ID]
gcloud iam service-accounts create [CONNECT_SERVICE_ACCOUNT_NAME] --project [PROJECT_ID]
gcloud iam service-accounts create [CONNECT_SERVICE_ACCOUNT_NAME] --project [PROJECT_ID]

3. Next, we will enable the required APIs in your Google Cloud Project. These will enable the GKE On-Prem, Connect, and Stackdriver APIs for us.

gcloud services enable --project=[PROJECT_ID]

4. Next, we need to create IAM Policy bindings for the service accounts that we created earlier to specific IAM roles.

gcloud projects add-iam-policy-binding [PROJECT_ID] --member="serviceAccount:[REGISTER_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role="roles/gkehub.admin"
gcloud projects add-iam-policy-binding [PROJECT_ID] --member="serviceAccount:[REGISTER_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role="roles/serviceusage.serviceUsageViewer"
gcloud projects add-iam-policy-binding [PROJECT_ID] --member="serviceAccount:[CONNECT_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role="roles/gkehub.connect"
gcloud projects add-iam-policy-binding [PROJECT_ID] --member "serviceAccount:[STACKDRIVER_LOGGING_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role "roles/stackdriver.resourceMetadata.writer"
gcloud projects add-iam-policy-binding [PROJECT_ID] --member "serviceAccount:[STACKDRIVER_LOGGING_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role "roles/logging.logWriter"
gcloud projects add-iam-policy-binding [PROJECT_ID] --member "serviceAccount:[STACKDRIVER_LOGGING_SERVICE_ACCOUNT_NAME]@[PROJECT_ID]" --role "roles/monitoring.metricWriter"

5. Create a Stackdriver workspace for your project.
6. Create a key for the Access Service Account created earlier.

gcloud iam service-accounts keys create [access-key-name] --iam-account [ACCESS_SERVICE_ACCOUNT_NAME]@[PROJECT_ID] --project [PROJECT_ID]

7. Configure gcloud and gsutil to use access service account created earlier.

gcloud auth activate-service-account [ACCESS_SERVICE_ACCOUNT_NAME]@[PROJECT_ID] --key-file [access-key-name]

8. Download the ova file into your home directory

gsutil cp gs://gke-on-prem-release/admin-appliance/1.0.1-gke.5/gke-on-prem-admin-appliance-vsphere-1.0.1-gke.5.{ova,ova.sig} /home/user

9. Verify the signature for the download admin appliance template.

openssl dgst -verify --signature /tmp/gkectl.sig /usr/bin/gkectl <<'EOF'
-----END PUBLIC KEY-----

10. Import OVA into vSphere and mark it as a template. Create a shell script and copy the following commands into it.

export GOVC_INSECURE=true
export HTTPS_PROXY=[HTTPS_PROXY] # optional; necessary if you use a proxy

govc import.ova [ADMIN_OVA_DIR]/gke-on-prem-admin-appliance-vsphere-1.0.1-gke.5.ova
govc vm.markastemplate gke-on-prem-admin-appliance-vsphere-1.0.1-gke.5

11. Save the file and then change the access permission for the file by using the following command, and then execute the script.

chmod +x [govc-file-name]

12. This will upload the ova file into your vSphere environment. If you don’t have the name of your VM Network as the default ‘VM-Network’, you might have some issues in the future steps. So, go to vCenter, convert the template into a VM, edit settings to change the VM Network to your actual VM Network and then convert it back into a VM Template.

13. Next, go back to your Ubuntu VM and then generate an ssh key pair.

ssh-keygen -t rsa -f ~/.ssh/vsphere_workstation -N ""

14. Create a new directory for the terraform files.

mkdir [Terraform-directory-name]

15. Create and terraform.tfvars files in this new directory. Depending on whether you are using static IP addresses or have a DHCP server running in your environment, copy the appropriate files from this link.

16. Update the terraform.tfvars file with your environment specific variables

17. Next, do a terraform init and terraform apply. This will deploy a new Admin Workstation VM in your vSphere cluster.

terraform init && terraform apply -auto-approve -input=false

18. Retrieve the IP address of the new admin workstation and then ssh into it using the key pair that we generated in step 13.

terraform output ip_address
ssh -i ~/.ssh/vsphere_workstation ubuntu@$(terraform output ip_address)

19. The GKE On-Prem admin workstation has gkectl, kubectl, docker and gcloud cli pre-installed. Once you are logged into the admin workstation, go ahead and log into your Google Cloud Account.

gcloud auth login

20. Now, we will create keys for the access, register, connect and stackdriver service accounts that we had created earlier in the process and download them onto the admin workstation.

gcloud iam service-accounts keys create [ACCESS_KEY_FILE] --iam-account [ACCESS_SERVICE_ACCOUNT_NAME]@[PROJECT-ID] --project [PROJECT_ID]
gcloud iam service-accounts keys create [REGISTER_KEY_FILE] --iam-account [REGISTER_SERVICE_ACCOUNT_NAME]@[PROJECT-ID] --project [PROJECT_ID]
gcloud iam service-accounts keys create [CONNECT_KEY_FILE] --iam-account [CONNECT_SERVICE_ACCOUNT_NAME]@[PROJECT-ID] --project [PROJECT_ID]
gcloud iam service-accounts keys create [STACKDRIVER_KEY_FILE] --iam-account [STACKDRIVER_LOGGING_SERVICE_ACCOUNT_NAME]@[PROJECT_ID] --project [PROJECT_ID]

21. Now, switch from your Google Account authorization to the Access Service Account Authorization

gcloud auth activate-service-account --key-file=[ACCESS_KEY_FILE]

22. Next, register gcloud as a Docker credential helper

gcloud auth configure-docker

23. Run a few verification commands to make sure everything looks good before moving onto creating the configuration files for the actual GKE On-Prem cluster deployment.

gkectl version
docker version
docker images
docker ps

24. Great job till this point. We are just a few steps away from having a GKE On-Prem cluster deployed in our datacenter and registered to Google Cloud.

25. Create a cluster config using the following command:

gkectl create-config [--config [PATH]]

26. Modify the configuration file parameters to match your environment-specific variable. Below, I have given some pointers on how to find or generate the values required.

  • bundlepath = “/var/lib/gke/bundles/gke-onprem-vsphere-1.0.1-gke.5-full.tgz” : This remains the same for everyone, because the admin workstation has this .tgz file preloaded.
  • Fill in your vCenter IP, username and password, Datacenter and Datastore, Cluster and ResourcePool and then use the Inside Network as the Network. Inside Network was the network that we created as part of the pre-reqs. The K8s VMs that will be deployed will get an IP address in this subnet from a DHCP server.
  • datadisk: Here is a sample value: “gkeop/gke-on-prem-data-disk-anthos.vmdk”. Make sure that you create the “gkeop” directory in your datastore before you run the create cluster command.
  • cacertpath: Path to the vCenter CA cert. To download the vCenter CA cert, use the following command:
echo quit | openssl s_client -showcerts -servername [vCenter-IP] -connect [vCenter-IP]:443 > vcenter-cert.pem
  • Now, for the admin cluster settings. Enter the IP address, username and password, and the admin-cluster partition name created in F5.
  • Use a couple of IPs from the Outside Network, and assign it to the Control Plane VIP and Ingress VIP for the Admin Cluster
  • Specify at least a /16 network for the service IP range and pod IP range. These IP ranges can be the same for the admin and user clusters and don’t have to be routable. From this /16 network, each Node in the Kubernetes cluster will get a /24 network for the pods running on it. This means that you can only have 255 pods running on each virtual machine, and only so many VMs as part of your cluster.
  • Now, for the user cluster settings. Enter the IP address, username and password, and the user-cluster partition created in F5.
  • Enter a couple of IPs from the Outside Network for the Control Plane VIP and Ingress VIP for the User Cluster.
  • Configure the CPU and Memory settings for the admin and worker VMs that are going to make up your User Cluster.
  • Enter the same or distinct network ranges for the pods and service networks for the User cluster.
  • Now, we will fill out the Google Cloud specific parameters. Enter the:
    • Google Cloud Project ID
    • Register and Connect service account key paths
    • Stackdriver details
    • Access service account key path
    • Double-check all the values that you have entered and make sure that there aren’t any typos. It will take a long time to figure out issues if there are any typos(Speaking from personal experience)

27. To validate that the configuration file looks good, use the following command:

gkectl check-config [--config [PATH]]

28. Now, run a prepare command to upload the K8s node VM template.

gkectl prepare --config [CONFIG_FILE] --validate-attestations

29. Next, we will go ahead and create the cluster using the following command.

gkectl create cluster --config config.yaml -v5 --alsologtostderr

The -v5 –alsologtostderr will basically dump all the logs during the installation process on your console, so you can watch all the steps being taken as part of the installation process.

This will take anywhere from 20 to 40 minutes. It will deploy the VMs for your Admin and User Kubernetes clusters and bootstrap those clusters, configure the Control Plane VIPs on your F5 load balancer, register your User Cluster with Google Cloud, enable monitoring with Stackdriver, etc. Once the clusters are deployed, you will find a couple of kubeconfig files in your current directory called kubeconfig (For the Admin Cluster) and user-cluster-name-kubeconfig (For the User Cluster). You can use the following command (replace the kubeconfig-file-name parameter with your file name) to start using your Kubernetes clusters using kubectl.

kubectl --kubeconfig [kubeconfig-file-name] get pods --all-namespaces

30. You can use the following yaml to enable ingress in your cluster

kind: Gateway
  name: istio-autogenerated-k8s-ingress
  namespace: gke-system
    istio: ingress-gke-system

31. Now that your clusters are successfully deployed, you can go ahead and log into your Google Cloud Console, and navigate to the Kubernetes Cluster section. You will see your User cluster already registered. The only step that you have to do at this point is to create service accounts and clusterrolebindings on your User Cluster, so you can use the token generated to log into the On-Prem cluster from Google Cloud Console. To do that, complete the following steps:

vi node-reader.yaml

kind: ClusterRole
 name: node-reader
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]
kubectl --kubeconfig=[kubeconfig] apply -f node-reader.yaml
kubectl --kubeconfig=[kubeconfig] create serviceaccount ksa-account
kubectl --kubeconfig=[kubeconfig] create serviceaccount user-id-ksa
kubectl --kubeconfig=[kubeconfig] create clusterrolebinding ksa-view --clusterrole view --serviceaccount default:ksa-account
kubectl --kubeconfig=[kubeconfig] create clusterrolebinding ksa-node-reader --clusterrole node-reader --serviceaccount default:ksa-account
kubectl --kubeconfig=[kubeconfig] create clusterrolebinding binding-account --clusterrole cluster-admin --serviceaccount default:ksa-account
SECRET_NAME=$(kubectl --kubeconfig=[kubeconfig] get serviceaccount ksa-account -o jsonpath='{$.secrets[0].name}')
kubectl --kubeconfig=[kubeconfig] get secret ${SECRET_NAME} -o jsonpath='{$.data.token}' | base64 -d

32. Copy this secret token and navigate back to the Google Cloud Console. You can use this token to authenticate against your User Cluster.

That’s all folks. You are all set. At this point, you have a Kubernetes cluster deployed on your On-Prem vSphere environment, that can be managed from the GCP Cloud Console. You can deploy ISV applications from the Google Cloud Marketplace onto the On-Prem K8s cluster. And you can manage your workloads in the same way that you manage workloads on top of your GKE K8s clusters.

Please comment below if you have any questions with the steps listed above, and I would be happy to help out. Now, that we are done with GKE On-Prem, we have completed a majority piece of Anthos. In the next blogs, we will cover Anthos Config Management and maybe Cloud Run. So stay tuned!!


One thought on “Installing Google Kubernetes Engine (GKE) On-Prem on VMware vSAN

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s