How to distribute workloads using Open Cluster Management
Tomer Figenblat
Posted on January 26, 2024
Open Cluster Management (OCM) was accepted to the Cloud Native Computing Foundation (CNCF) in late 2021 and is currently at the Sandbox project maturity level. OCM is a community-driven project focused on multicluster and multicloud scenarios for Kubernetes applications. This article shows how to bootstrap Open Cluster Management and handle work distribution using ManifestWork. We also discuss several ways to select clusters for various tasks using the ManagedClusterSet and Placement resources.
Basic Open Cluster Management
Open Cluster Management is based on a hub-spoke architecture. In this design, a single hub cluster prescribes prescriptions, and one or more spoke clusters act upon these prescriptions. In Open Cluster Management, spoke clusters are called managed clusters. The component running on the hub cluster is the cluster manager. The component running on a managed cluster, is the klusterlet agent.
OCM requires one-sided communication. From the managed clusters to the hub cluster. The communication to the hub is handled by two APIs:
- Registration: For joining managed clusters to the hub cluster and manage their lifecycle.
- Work: For relaying workloads in the form of prescriptions prescribed on the hub cluster and reconciled on the managed clusters.
OCM resources are easily managed using a command-line interface (CLI) named clusteradm.
A number of resources help you select the clusters used in your application. We will look at all these resources in this article:
- ManagedClusterSets collect ManagedClusters into groups.
- Placements let you select clusters from a ManagedClusterSet.
- To select clusters for a placement, you can use labels, ClusterClaims, taints and tolerations, and prioritizers.
As a simple, basic use case for placements, say you have two ManagedClusterSets, one for all the clusters in Israel and one for all the clusters in Canada. You can use placements to cherry-pick from these sets the managed clusters that are suited for testing or production.
Setting up the example environment
To get started, you'll need to install clusteradm and kubectl and start up three Kubernetes clusters. To simplify cluster administration, this article starts up three kind clusters with the following names and purposes:
-
kind-fedora1
runs the hub cluster. -
kind-rhel1
runs one managed cluster. -
kind-qnap1
runs another managed cluster.
Bootstrapping OCM
The bootstrapping task consists of initializing the hub cluster and joining the managed clusters to it. OCM registration involves a double opt-in handshake initiated by the managed clusters and accepted by the hub cluster. At any point in time, the connection can be ended by either party.
Initializing the hub cluster
Run the following command to initialize the hub cluster. Output from the clusteradm
command is filtered by the grep
command and assigned to the joinCommand
variable:
$ joinCommand=$(clusteradm init --context kind-fedora1 --wait | grep clusteradm)
Note: This command includes the deployment of the cluster manager, so it might take a couple of minutes.
You can verify that the cluster manager is running by checking for pods in the designated open-cluster-management-hub
and open-cluster-management
namespaces.
Joining and accepting the managed clusters
Join each managed cluster to the hub by injecting the cluster name into the stored joinCommand
variable and running the command. This command should be run for every managed cluster. In our example, we have two clusters to manage. As mentioned earlier, registration is a double opt-in handshake, so every join request needs to get accepted by the hub cluster through a clusteradm
command:
$ eval $(echo "$joinCommand --context kind-rhel1 --insecure-skip-tls-verify --wait" | sed 's/<cluster_name>/kind-rhel1/g' -)
$ eval $(echo "$joinCommand --context kind-qnap1 --insecure-skip-tls-verify --wait" | sed 's/<cluster_name>/kind-qnap1/g' -)
$ clusteradm --context kind-fedora1 accept --clusters kind-rhel1,kind-qnap1 --wait
Note: These commands deploy the klusterlet agents and initialize the registration, so they might take a couple of minutes.
You can verify that the klusterlet agent is running by checking for pods in the designated open-cluster-management-agent
and open-cluster-management
namespaces for every managed cluster.
For every managed cluster initiating a join request, the registration api creates a cluster-scoped ManagedCluster resource on the hub cluster with the specification and status for the associated managed cluster.
Excerpts from the resource for one of the clusters in our example, kind-rhel1
, follow. Note the status object, which holds valuable information about the cluster:
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: kind-rhel1
...
spec:
hubAcceptsClient: true
leaseDurationSeconds: 60
...
status:
allocatable:
cpu: "4"
ephemeral-storage: 71645Mi
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 8012724Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 71645Mi
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 8012724Ki
pods: "110"
conditions:
- lastTransitionTime: "2022-12-21T10:23:47Z"
message: Accepted by hub cluster admin
reason: HubClusterAdminAccepted
status: "True"
type: HubAcceptedManagedCluster
- lastTransitionTime: "2022-12-21T10:23:47Z"
message: Managed cluster joined
reason: ManagedClusterJoined
status: "True"
type: ManagedClusterJoined
- lastTransitionTime: "2022-12-21T10:23:47Z"
message: Managed cluster is available
reason: ManagedClusterAvailable
status: "True"
type: ManagedClusterConditionAvailable
version:
kubernetes: v1.25.3
Workload distribution across managed clusters
After a successful registration, the hub cluster creates a designated namespace for every joined managed cluster. These namespaces are called cluster namespaces and form the targets for workload distributions. In our example, we expect two namespaces named kind-qnap1
and kind-rhel1
.
It's important to note that cluster namespaces are used for only workload distributions and certain other activities related to the managed cluster. Anything else, such as placement resources (discussed later in this article) and subscriptions (to be discussed in a future article), go into your application's namespace.
To distribute workloads across managed clusters, apply a ManifestWork resource describing your workload in the cluster namespace. The Klusterlet Agents periodically check for ManifestWorks in their designated namespaces, reconcile themselves, and report back with the reconciliation status.
Here's a simple example of a ManifestWork including a namespace and a simple deployment. When applied to a cluster namespace, the associated managed cluster will apply the workload in an orderly fashion:
apiVersion: work.open-cluster-management.io/v1
kind: ManifestWork
metadata:
namespace: <target managed cluster>
name: hello-work-demo
spec:
workload:
manifests:
- apiVersion: v1
kind: Namespace
metadata:
name: hello-workload
- apiVersion: apps/v1
kind: Deployment
metadata:
name: hello
namespace: hello-workload
spec:
selector:
matchLabels:
app: hello
template:
metadata:
labels:
app: hello
spec:
containers:
- name: hello
image: quay.io/asmacdo/busybox
command:
["sh", "-c", 'echo "Hello, Kubernetes!" && sleep 3600']
You can verify that the distribution took place by looking for the hello
deployment in the hello-workload
namespace in each cluster you intended to deploy to.
After deploying this ManifestWork, the klusterlet agent of the associated managed cluster creates the necessary resources and reports back. Excerpts from a status report for the previous ManifestWork follow:
apiVersion: work.open-cluster-management.io/v1
kind: ManifestWork
metadata:
...
spec:
...
status:
conditions:
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: All resources are available
observedGeneration: 1
reason: ResourcesAvailable
status: "True"
type: Available
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: Apply manifest work complete
observedGeneration: 1
reason: AppliedManifestWorkComplete
status: "True"
type: Applied
resourceStatus:
manifests:
- conditions:
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: Apply manifest complete
reason: AppliedManifestComplete
status: "True"
type: Applied
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: Resource is available
reason: ResourceAvailable
status: "True"
type: Available
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: ""
reason: NoStatusFeedbackSynced
status: "True"
type: StatusFeedbackSynced
resourceMeta:
group: ""
kind: Namespace
name: hello-workload
namespace: ""
ordinal: 0
resource: namespaces
version: v1
statusFeedback: {}
- conditions:
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: Apply manifest complete
reason: AppliedManifestComplete
status: "True"
type: Applied
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: Resource is available
reason: ResourceAvailable
status: "True"
type: Available
- lastTransitionTime: "2022-12-21T11:15:35Z"
message: ""
reason: NoStatusFeedbackSynced
status: "True"
type: StatusFeedbackSynced
resourceMeta:
group: apps
kind: Deployment
name: hello
namespace: hello-workload
ordinal: 1
resource: deployments
version: v1
statusFeedback: {}
For every ManifestWork resource identified on the hub cluster, the klusterlet agent creates a cluster-scoped AppliedManifestWork resource on the managed cluster. This resource serves as the owner and the status reporter for the workload. Excerpts from the AppliedManifestWork for the previous ManifestWork follow:
apiVersion: work.open-cluster-management.io/v1
kind: AppliedManifestWork
metadata:
...
spec:
...
manifestWorkName: hello-work-demo
status:
appliedResources:
- group: ""
name: hello-workload
namespace: ""
resource: namespaces
version: v1
....
- group: apps
name: hello
namespace: hello-workload
resource: deployments
version: v1
...
Grouping managed clusters
As explained earlier, the hub cluster creates a cluster-scoped ManagedCluster
to represent each managed cluster joined. You can group multiple ManagedClusters using the cluster-scoped ManagedClusterSet.
At the start, there are two pre-existing ManagedClustetSets: The default one, which includes every newly joined ManagedCluster, and the global one that includes all ManagedClusters:
$ clusteradm --context kind-fedora1 get clustersets
<ManagedClusterSet>
└── <default>
│ ├── <BoundNamespace>
│ ├── <Status> 2 ManagedClusters selected
└── <global>
└── <Status> 2 ManagedClusters selected
└── <BoundNamespace>
Now add your own set:
$ clusteradm --context kind-fedora1 create clusterset managed-clusters-region-a
Next, configure both of your clusters as members. The following command overwrites the designated label for ManagedClusters custom resources:
$ clusteradm --context kind-fedora1 clusterset set managed-clusters-region-a --clusters kind-rhel1,kind-qnap1
When you set your clusters as members of your cluster set, they are removed from the pre-existing default set.
As stated earlier, the ManagedClusterSet resource is cluster-scoped. When you write your application using the cluster set, you have to bind it into your application's namespace, which you can do using clusteradm
.
The following command creates a namespace-scoped ManagedClusterSetBinding custom resource in your application namespace:
$ clusteradm --context kind-fedora1 clusterset bind managed-clusters-region-a --namespace our-application-ns
You can verify through the following command that your clusters have moved from the default
set to your new set, and that your set is bound to your application namespace:
$ clusteradm --context kind-fedora1 get clustersets
<ManagedClusterSet>
└── <default>
│ ├── <BoundNamespace>
│ ├── <Status> No ManagedCluster selected
└── <global>
│ ├── <Status> 2 ManagedClusters selected
│ ├── <BoundNamespace>
└── <managed-clusters-region-a>
└── <BoundNamespace> our-application-ns
└── <Status> 2 ManagedClusters selected
Selecting clusters from the set
With your ManagedClusterSet bound to your application namespace, you can create a Placement to dynamically select clusters from the set. The following subsections show several ways to select clusters, fetch the selected cluster list, and prioritize managed clusters.
Using labels to select clusters
You can select clusters for a placement using labels. The following configuration tells your placement which labels to look for in ManagedClusters within the ManagedClusterSets configured:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: our-label-placement
namespace: our-application-ns
spec:
numberOfClusters: 1
clusterSets:
- managed-clusters-region-a
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
our-custom-label: "include-me-4-tests"
Because you haven't yet labeled any of your ManagedClusters yet, your placement will not find any clusters:
$ kubectl --context kind-fedora1 get placement -n our-application-ns
NAME SUCCEEDED REASON SELECTEDCLUSTERS
our-label-placement False NoManagedClusterMatched 0
So add the appropriate label on one of your ManagedClusters, i.e. kind-rhel1
:
$ kubectl --context kind-fedora1 label managedcluster kind-rhel1 our-custom-label="include-me-4-tests"
Your placement should now pick this up:
$ kubectl --context kind-fedora1 get placement -n our-application-ns
NAME SUCCEEDED REASON SELECTEDCLUSTERS
our-label-placement True AllDecisionsScheduled 1
Using ClusterClaims to select clusters
ClusterClaims address two concerns with the use of labels for placement. The first is that, although labels are useful, their overuse makes them error-prone. The second concern is that, in our case, the labels are added for resources on the hub cluster, requiring the cluster administrator or another permitted user with access.
With ClusterClaims, the selection of clusters can be delegated to the managed clusters. ClusterClaims are custom resources applied on the managed cluster. Their content is propagated to the hub as a status for their associated ManagedCluster resources. The cluster administrators for a managed cluster can decide, for instance, which of their clusters are used for tests and which are used for production by simply applying this agreed-upon custom resource.
Clusterclaims can also be used in conjunction with labels to fine-grain your selection.
Apply the following YAML in one of your managed clusters. This example uses kind-qnap1
:
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: ClusterClaim
metadata:
name: our-custom-clusterclaim
spec:
value: include-me-for-tests
Propagation can be verified on the associated ManagedCluster resource on the hub cluster:
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: kind-qnap1
...
spec:
hubAcceptsClient: true
...
status:
allocatable:
...
capacity:
...
clusterClaims:
- name: our-custom-clusterclaim
value: include-me-for-tests
conditions:
...
version:
kubernetes: v1.25.3
Now you can create a placement based on your ClusterClaim:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: our-clusterclaim-placement
namespace: our-application-ns
spec:
numberOfClusters: 1
clusterSets:
- managed-clusters-region-a
predicates:
- requiredClusterSelector:
claimSelector:
matchExpressions:
- key: our-custom-clusterclaim
operator: In
values:
- include-me-for-tests
And verify the placement selected the claimed ManagedCluster:
$ kubectl --context kind-fedora1 get placement -n our-application-ns
NAME SUCCEEDED REASON SELECTEDCLUSTERS
our-clusterclaim-placement True AllDecisionsScheduled 1
our-label-placement True AllDecisionsScheduled 1
Using taints and tolerations for the selection
Taints and tolerations help you filter out unhealthy or otherwise not-ready clusters.
Taints are properties of ManagedCluster resources. The following command adds a taint to make your placement deselect your kind-qnap1
cluster. The command also removes existing taints from this ManagedCluster, so be careful when executing such commands:
$ kubectl --context kind-fedora1 patch managedcluster kind-qnap1 --type='json' -p='[{"op": "add", "path": "/spec/taints", "value": [{"effect": "NoSelect", "key": "our-custom-taint-key", "value": "our-custom-taint-value" }] }]'
To verify that the taint was added, execute:
$ kubectl --context kind-fedora1 get managedcluster kind-qnap1 -o jsonpath='{.spec.taints[*]'}
{"effect":"NoSelect","key":"our-custom-taint-key","timeAdded":"2022-12-22T15:38:47Z","value":"our-custom-taint-value"}
Now verify that the relevant placement has deselected your cluster based on the NoSelect
effect:
$ kubectl --context kind-fedora1 get placement -n our-application-ns our-clusterclaim-placement
NAME SUCCEEDED REASON SELECTEDCLUSTERS
our-clusterclaim-placement False NoManagedClusterMatched 0
A toleration overrides taints. So your next experiment is to make your placement ignore the previous taint using a toleration. Again, be careful when executing commands like the following because it removes any existing tolerations from the placement:
$ kubectl --context kind-fedora1 patch placement -n our-application-ns our-clusterclaim-placement --type='json' -p='[{"op": "add", "path": "/spec/tolerations", "value": [{"key": "our-custom-taint-key", "value": "our-custom-taint-value", "operator": Equal }] }]'
Verify that the toleration was added:
$ kubectl --context kind-fedora1 get placement -n our-application-ns our-clusterclaim-placement -o jsonpath='{.spec.tolerations[*]'}
"key":"our-custom-taint-key","operator":"Equal","value":"our-custom-taint-value"}
And verify that the relevant placement has reselected your cluster:
$ kubectl --context kind-fedora1 get placement -n our-application-ns our-clusterclaim-placement
NAME SUCCEEDED REASON SELECTEDCLUSTERS
our-clusterclaim-placement True AllDecisionsScheduled 1
Two taints are automatically created by the system:
cluster.open-cluster-management.io/unavailable
cluster.open-cluster-management.io/unreachable
You can't manually modify these taints, but you can add tolerations to override them. You can even issue temporary tolerations that last a specified number of TolerationSeconds, as described in the placement documentation.
Fetching the selected cluster list
As long as a placement has selected at least one cluster, the system creates PlacementDecision resources listing the selected clusters:
$ kubectl --context kind-fedora1 get placementdecisions -n our-application-ns
NAME AGE
our-clusterclaim-placement-decision-1 10m
our-label-placement-decision-1 22m
A PlacementDecision has the same namespace and name as its placement counterpart.
Your PlacementDecision for the placement selected by label should display your labeled ManagedCluster:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: PlacementDecision
metadata:
name: our-label-placement-decision-1
namespace: our-application-ns
ownerReferences:
- apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
name: our-label-placement
...
...
status:
decisions:
- clusterName: kind-rhel1
reason: ""
Similarly, your PlacementDecision for the placement selected by ClusterClaim should display your claimed ManagedCluster:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: PlacementDecision
metadata:
name: our-clusterclaim-placement-decision-1
namespace: our-application-ns
ownerReferences:
- apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
name: our-clusterclaim-placement
...
...
status:
decisions:
- clusterName: kind-qnap1
reason: ""
Prioritizing clusters for selection
Prioritizers tell placements to prefer some clusters based on built-in ScoreCoordinates. You can also extend prioritization by using an AddOnPlacementScore.
The following settings configure your placement to use prioritization and sort your clusters based on the allocated memory and CPU capacity (as viewed in the cluster status). Each coordinate is assigned a different weight for the selection:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: our-clusterclaim-placement
namespace: our-application-ns
spec:
numberOfClusters: 2
clusterSets:
...
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemory
weight: 2
- scoreCoordinate:
builtIn: ResourceAllocatableCPU
weight: 3
predicates:
- requiredClusterSelector:
...
To extend the built-in score coordinates, use a designated AddOnPlacementScore in your application namespace:
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScore
metadata:
name: our-addon-placement-score
namespace: our-application-ns
status:
conditions:
- lastTransitionTime: "2021-10-28T08:31:39Z"
message: AddOnPlacementScore updated successfully
reason: AddOnPlacementScoreUpdated
status: "True"
type: AddOnPlacementScoreUpdated
validUntil: "2021-10-29T18:31:39Z"
scores:
- name: "our-custom-score-a"
value: 66
- name: "our-custom-score-b"
value: 55
Now, you can modify your placement and add score coordinates referencing your custom addon scores:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: our-clusterclaim-placement
namespace: our-application-ns
spec:
numberOfClusters: 2
clusterSets:
...
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemory
weight: 2
- scoreCoordinate:
builtIn: ResourceAllocatableCPU
weight: 3
- scoreCoordinate:
builtIn: AddOn
addOn:
resourceName: our-addon-placement-score
scoreName: our-custom-score-a
weight: 1
- scoreCoordinate:
builtIn: AddOn
addOn:
resourceName: our-addon-placement-score
scoreName: our-custom-score-b
weight: 4
predicates:
- requiredClusterSelector:
...
Summarizing OCM basics
This article introduced Open Cluster Management and explained how to:
- Bootstrap OCM on hub and spoke (managed) clusters.
- Deploy a workload across managed clusters.
- Group managed clusters into sets.
- Select clusters from cluster sets.
- Customize placement scheduling.
Upcoming articles will cover various features, addons, frameworks, and integrations related to OCM. These components make use of the infrastructure concerning work distributions and placements described in this article. In the meantime, you can learn about how to prevent computer overload with remote kind clusters. I want to thank you for taking the time to read this article, and I hope you got something out of it. Feel free to comment below if you have questions. We welcome your feedback.
Posted on January 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.