Scaling up and down
Estimated time to read: 4 minutes
Acquisition notice
In October 2022, ServiceNow acquired Era Software. The documentation on this site is no longer maintained and is intended for existing Era Software users only.
To get the latest information about ServiceNow's observability solutions, visit their website and documentation.
Starting with EraSearch v1.24.0, the rebalance
and divest
commands were added to the EraSearch Cache Service container images. These commands allow for rebalancing data across a deployment in the event that Cache Service nodes are added or removed. By rebalancing, the system can maintain a healthy level of disk utilization across all Cache Service nodes, reducing the chance of "hot spots" and ensuring optimal performance.
This page outlines how to rebalance data when either scaling up (adding new Cache Service nodes) or scaling down (removing Cache Service nodes).
Scaling up¶
The scale up procedure works by inspecting the roots registered across the system and re-allocating ownership across all available Cache Service nodes, ensuring that all nodes have a roughly equal amount of data stored on them. Once the roots have been re-allocated:
- The original owner of the root will remove their local copy.
- The new owner of the root, at query time, will rehydrate the root from object storage and store it locally.
To scale up an EraSearch deployment, start by increasing the value of the quarry.replicaCount
variable within your Helm values.yaml
file:
Then upgrade the EraSearch deployment using the helm upgrade
command:
Once all of the new Cache Service nodes are marked as healthy, create a shell into any of the Cache Service nodes (the stateful set pods) and run the following command to rebalance the deployment:
Where:
QUARRY_HEADLESS_SERVICE_URLS
is the name of the headless service URL created for the Cache Service (automatically set if using Kubernetes-based service discovery). This value can be set manually if needed, and is typically of the format:${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local
While running, information related to the rebalance procedure will be printed to the terminal output:
INFO [rebalance] - Namespace kubernetes-cert-manager balanced
INFO [rebalance] - Namespace kubernetes-frontend-prod on worker http://era-deployment-quarry-0.era-deployment-quarry-headless.era-deployment.svc.cluster.local:9200 contains 357 level 2 roots
INFO [rebalance] - Namespace kubernetes-frontend-prod balanced
Once the command is completed, the deployment has been successfully rebalanced.
Rebalancing on a cron¶
To automatically rebalance a deployment on a regular interval, create a Kubernetes CronJob object similar to the one shown below.
apiVersion: batch/v1
kind: CronJob
metadata:
name: erasearch-rebalance
namespace: ${NAMESPACE}
spec:
jobTemplate:
spec:
template:
spec:
containers:
- command:
- /bin/sh
- -c
- rebalance ${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local:9200
image: us.gcr.io/eradb-registry/quarry:1.24.0
imagePullPolicy: IfNotPresent
name: quarry
imagePullSecrets:
- name: eradb-registry
restartPolicy: OnFailure
terminationGracePeriodSeconds: 30
schedule: "*/5 * * * *"
Where:
${RELEASE_NAME}
is the name of the Helm release used for the deployment.${NAMESPACE}
is the Kubernetes namespace where the EraSearch deployment is located.
This will issue a rebalance call every five minutes (or as otherwise set by the schedule
option).
Scaling down¶
The scale down procedure works by divesting individual Cache Service nodes of their assigned roots across the remaining Cache Service nodes. Similar to the scale up procedure described above, this ensures that the remaining nodes have a roughly equal amount of data stored on each.
In order for a Cache Service node to divest its data, it:
- Sets itself into read-only mode, preventing any new data from being written to it. Note that the node can still service queries, however writes will be redirected to the other available nodes.
- Equally distributes ownership of its roots across the remaining nodes.
- Disables its readiness endpoint, eventually draining the node of all network traffic and removing it from any automated service discovery mechanisms.
To divest a Cache Service node of its data, create a shell into any of the Cache Service nodes (the stateful set pods) and run the following command:
Where:
${URL_OF_NODE_BEING_REMOVED}
is the full worker URL of the Cache Service node that will be removed from the deployment. The worker URL is of the format:http://${RELEASE_NAME}-quarry-N.${RELEASE_NAME}-quarry-headless.${NAMESPACE}.svc.cluster.local:9200
Tip
You can retrieve a list of the available worker URLs by running the following command from any available Cache Service node:
Danger
The scale down procedure requires that the divest
command be run on nodes before they are decommissioned. When decommissioning multiple nodes at once, run the divest
command serially (or one at a time) for each node to be removed.
If you have any questions or concerns, please reach out to [email protected].
Once the command is completed, the node can be safely removed from the deployment.
To scale down an EraSearch deployment, decrease the value of the quarry.replicaCount
variable within your Helm values.yaml
file:
Then upgrade the EraSearch deployment using the helm upgrade
command:
Once completed, the deployment has been successfully scaled down.