Skip to content

Scaling up and down

Estimated time to read: 4 minutes

Starting with EraSearch v1.24.0, the rebalance and divest commands were added to the EraSearch Cache Service container images. These commands allow for rebalancing data across a deployment in the event that Cache Service nodes are added or removed. By rebalancing, the system can maintain a healthy level of disk utilization across all Cache Service nodes, reducing the chance of "hot spots" and ensuring optimal performance.

This page outlines how to rebalance data when either scaling up (adding new Cache Service nodes) or scaling down (removing Cache Service nodes).

Scaling up

The scale up procedure works by inspecting the roots registered across the system and re-allocating ownership across all available Cache Service nodes, ensuring that all nodes have a roughly equal amount of data stored on them. Once the roots have been re-allocated:

  • The original owner of the root will remove their local copy.
  • The new owner of the root, at query time, will rehydrate the root from object storage and store it locally.

To scale up an EraSearch deployment, start by increasing the value of the quarry.replicaCount variable within your Helm values.yaml file:

quarry:
  replicaCount: 8 # previously set to 6

Then upgrade the EraSearch deployment using the helm upgrade command:

$ helm upgrade ${RELEASE_NAME} ./charts/eradb-X.Y.Z.tgz -n ${NAMESPACE} --values values-eradb.yaml

Once all of the new Cache Service nodes are marked as healthy, create a shell into any of the Cache Service nodes (the stateful set pods) and run the following command to rebalance the deployment:

/usr/local/bin/rebalance ${QUARRY_HEADLESS_SERVICE_URLS}:9200

Where:

  • QUARRY_HEADLESS_SERVICE_URLS is the name of the headless service URL created for the Cache Service (automatically set if using Kubernetes-based service discovery). This value can be set manually if needed, and is typically of the format: ${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local

While running, information related to the rebalance procedure will be printed to the terminal output:

INFO [rebalance] - Namespace kubernetes-cert-manager balanced
INFO [rebalance] - Namespace kubernetes-frontend-prod on worker http://era-deployment-quarry-0.era-deployment-quarry-headless.era-deployment.svc.cluster.local:9200 contains 357 level 2 roots
INFO [rebalance] - Namespace kubernetes-frontend-prod balanced

Once the command is completed, the deployment has been successfully rebalanced.

Rebalancing on a cron

To automatically rebalance a deployment on a regular interval, create a Kubernetes CronJob object similar to the one shown below.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: erasearch-rebalance
  namespace: ${NAMESPACE}
spec:
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - command:
                - /bin/sh
                - -c
                - rebalance ${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local:9200
              image: us.gcr.io/eradb-registry/quarry:1.24.0
              imagePullPolicy: IfNotPresent
              name: quarry
          imagePullSecrets:
            - name: eradb-registry
          restartPolicy: OnFailure
          terminationGracePeriodSeconds: 30
  schedule: "*/5 * * * *"

Where:

  • ${RELEASE_NAME} is the name of the Helm release used for the deployment.
  • ${NAMESPACE} is the Kubernetes namespace where the EraSearch deployment is located.

This will issue a rebalance call every five minutes (or as otherwise set by the schedule option).

Scaling down

The scale down procedure works by divesting individual Cache Service nodes of their assigned roots across the remaining Cache Service nodes. Similar to the scale up procedure described above, this ensures that the remaining nodes have a roughly equal amount of data stored on each.

In order for a Cache Service node to divest its data, it:

  • Sets itself into read-only mode, preventing any new data from being written to it. Note that the node can still service queries, however writes will be redirected to the other available nodes.
  • Equally distributes ownership of its roots across the remaining nodes.
  • Disables its readiness endpoint, eventually draining the node of all network traffic and removing it from any automated service discovery mechanisms.

To divest a Cache Service node of its data, create a shell into any of the Cache Service nodes (the stateful set pods) and run the following command:

/usr/local/bin/divest ${URL_OF_NODE_BEING_REMOVED}

Where:

  • ${URL_OF_NODE_BEING_REMOVED} is the full worker URL of the Cache Service node that will be removed from the deployment. The worker URL is of the format: http://${RELEASE_NAME}-quarry-N.${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local:9200

Tip

You can retrieve a list of the available worker URLs by running the following command from any available Cache Service node:

curl -sL localhost:9200/_eradb/admin/workers/v1 | jq .

Danger

The scale down procedure requires that the divest command be run on nodes before they are decommissioned. When decommissioning multiple nodes at once, run the divest command serially (or one at a time) for each node to be removed.

If you have any questions or concerns, please reach out to [email protected].

Once the command is completed, the node can be safely removed from the deployment.

To scale down an EraSearch deployment, decrease the value of the quarry.replicaCount variable within your Helm values.yaml file:

quarry:
  replicaCount: 6 # previously set to 8

Then upgrade the EraSearch deployment using the helm upgrade command:

$ helm upgrade ${RELEASE_NAME} ./charts/eradb-X.Y.Z.tgz -n ${NAMESPACE} --values values-eradb.yaml

Once completed, the deployment has been successfully scaled down.


Last update: November 28, 2022