14 minute read
EffectiveHorizontalPodAutoscaler helps you manage application scaling in an easy way.
It is compatible with HorizontalPodAutoscaler but extends more features.
EffectiveHorizontalPodAutoscaler supports prediction-driven autoscaling.
With this capability, user can forecast the incoming peak flow and scale up their application ahead, also user can know when the peak flow will end and scale down their application gracefully.
Besides that, EffectiveHorizontalPodAutoscaler also defines several scale strategies to support different scaling scenarios.
A EffectiveHorizontalPodAutoscaler sample yaml looks like below:
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef: #(1)
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1 #(2)
maxReplicas: 10 #(3)
scaleStrategy: Auto #(4)
metrics: #(5)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
prediction: #(6)
predictionWindowSeconds: 3600 #(7)
predictionAlgorithm:
algorithmType: dsp
dsp:
sampleInterval: "60s"
historyLength: "3d"
Most of online applications follow regular pattern. We can predict future trend of hours or days. DSP is a time series prediction algorithm that applicable for application metrics prediction.
The following shows a sample EffectiveHorizontalPodAutoscaler yaml with prediction enabled.
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
prediction:
predictionWindowSeconds: 3600
predictionAlgorithm:
algorithmType: dsp
dsp:
sampleInterval: "60s"
historyLength: "3d"
When user defines spec.metrics
in EffectiveHorizontalPodAutoscaler and prediction configuration is enabled, EffectiveHPAController will convert it to a new metric and configure the background HorizontalPodAutoscaler.
This is a source EffectiveHorizontalPodAutoscaler yaml for metric definition.
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
It’s converted to underlying HorizontalPodAutoscaler metrics yaml.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
spec:
metrics:
- pods:
metric:
name: crane_pod_cpu_usage
selector:
matchLabels:
autoscaling.crane.io/effective-hpa-uid: f9b92249-eab9-4671-afe0-17925e5987b8
target:
type: AverageValue
averageValue: 100m
type: Pods
- resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
type: Resource
In this sample, the resource metric defined by user is converted into two metrics: prediction metric and origin metric.
targetAverageUtilization
, it’s converted to targetAverageValue
based on target pod cpu request.HorizontalPodAutoscaler will calculate on each metric, and propose new replicas based on that. The largest one will be picked as the new scale.
There are six steps of prediction and scaling process:
Below is the process flow.
Prometheus Adapter is a popular custom metric adapter, through which user-defined metrics configurations are accessible.
Please see this doc to learn more .
Let’s take one use case that using EffectiveHorizontalPodAutoscaler in production cluster.
We did a profiling on the load history of one application in production and replayed it in staging environment. With the same application, we leverage both EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler to manage the scale and compare the result.
From the red line in below chart, we can see its actual total cpu usage is high at ~8am, ~12pm, ~8pm and low in midnight. The green line shows the prediction cpu usage trend.
Below is the comparison result between EffectiveHorizontalPodAutoscaler and HorizontalPodAutoscaler. The red line is the replica number generated by HorizontalPodAutoscaler and the green line is the result from EffectiveHorizontalPodAutoscaler.
We can see significant improvement with EffectiveHorizontalPodAutoscaler:
EffectiveHorizontalPodAutoscaler provides two strategies for scaling: Auto
and Preview
. User can change the strategy at runtime, and it will take effect on the fly.
Auto strategy achieves automatic scaling based on metrics. It is the default strategy. With this strategy, EffectiveHorizontalPodAutoscaler will create and control a HorizontalPodAutoscaler instance in backend. We don’t recommend explicit configuration on the underlying HorizontalPodAutoscaler because it will be overridden by EffectiveHPAController. If user delete EffectiveHorizontalPodAutoscaler, HorizontalPodAutoscaler will be cleaned up too.
Preview strategy means EffectiveHorizontalPodAutoscaler won’t change target’s replicas automatically, so you can preview the calculated replicas and control target’s replicas by themselves. User can switch from default strategy to this one by applying spec.scaleStrategy
to Preview
. It will take effect immediately, During the switch, EffectiveHPAController will disable HorizontalPodAutoscaler if exists and scale the target to the value spec.specificReplicas
, if user not set spec.specificReplicas
, when ScaleStrategy is change to Preview, it will just stop scaling.
A sample preview configuration looks like following:
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
spec:
scaleStrategy: Preview # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Preview".
specificReplicas: 5 # SpecificReplicas specify the target replicas.
status:
expectReplicas: 4 # expectReplicas is the calculated replicas that based on prediction metrics or spec.specificReplicas.
currentReplicas: 4 # currentReplicas is actual replicas from target
EffectiveHorizontalPodAutoscaler supports cron based autoscaling.
Besides based on monitoring metrics, sometimes there are differences between holiday and weekdays in workload traffic, and a simple prediction algorithm may not work relatively well. Then you can make up for the lack of prediction by setting the weekend cron to have a larger number of replicas.
For some non-web traffic applications, for example, some applications do not need to work on weekends, and then want to reduce the workload replicas to 1, you can also configure cron to reduce the cost for your service.
Following are cron main fields in the ehpa spec:
Current cron autoscaling capabilities from some manufacturers and communities have some shortcomings.
The following figure shows the comparison between the current EHPA cron autoscaling implementation and other cron capabilities.
To address the above issues, the cron autoscaling implemented by EHPA is designed on the basis of compatibility with HPA, and cron, as an indicator of HPA, acts on the workload object together with other indicators. In addition, the setting of cron is also very simple. When cron is configured separately, the default scaling of the workload will not be performed when it is not in the active time range.
You can just configure cron itself to work, assume you have no other metrics configured.
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
metadata:
name: php-apache-local
spec:
# ScaleTargetRef is the reference to the workload that should be scaled.
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
maxReplicas: 100 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
# Better to setting cron to fill the one complete time period such as one day, one week
# Below is one day cron scheduling, it
#(targetReplicas)
#80 -------- --------- ----------
# | | | | | |
#10 ------------ ----- -------- ----------
#(time) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Local timezone means you use the server's(or maybe is a container's) timezone which the craned running in. for example, if your craned started as utc timezone, then it is utc. if it started as Asia/Shanghai, then it is Asia/Shanghai.
crons:
- name: "cron1"
timezone: "Local"
description: "scale down"
start: "0 0 ? * *"
end: "0 6 ? * *"
targetReplicas: 10
- name: "cron2"
timezone: "Local"
description: "scale up"
start: "0 6 ? * *"
end: "0 9 ? * *"
targetReplicas: 80
- name: "cron3"
timezone: "Local"
description: "scale down"
start: "00 9 ? * *"
end: "00 11 ? * *"
targetReplicas: 10
- name: "cron4"
timezone: "Local"
description: "scale up"
start: "00 11 ? * *"
end: "00 14 ? * *"
targetReplicas: 80
- name: "cron5"
timezone: "Local"
description: "scale down"
start: "00 14 ? * *"
end: "00 17 ? * *"
targetReplicas: 10
- name: "cron6"
timezone: "Local"
description: "scale up"
start: "00 17 ? * *"
end: "00 20 ? * *"
targetReplicas: 80
- name: "cron7"
timezone: "Local"
description: "scale down"
start: "00 20 ? * *"
end: "00 00 ? * *"
targetReplicas: 10
CronSpec has following fields.
UTC
timezone. you can set it to Local
which means you use timezone of the container of crane service running in. Also, America/Los_Angeles
is ok.Above means each day, the workload needs to keep the replicas hourly.
#80 -------- --------- ----------
# | | | | | |
#1 ------------ ----- -------- ----------
#(time) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Remember not to set start time is after end. For example, when you set following:
crons:
- name: "cron2"
timezone: "Local"
description: "scale up"
start: "0 9 ? * *"
end: "0 6 ? * *"
targetReplicas: 80
Above is not valid because the start will be always later than end. The hpa controller will always get the workload’s desired replica to scale, which means keep the original replicas.
There are six steps of cron-driven and scaling process:
TargetReplicas
specified in the CronSpec
.When use ehpa, users can configure only cron metric, let the ehpa to be used as cron hpa.
Multiple crons of one ehpa will be transformed to one external metric. HPA will fetch this external cron metric and calculates target replicas when reconcile. HPA will select the largest proposal replicas to scale the workload from multiple metrics.
EffectiveHorizontalPodAutoscaler is compatible with HorizontalPodAutoscaler(Which is kubernetes built in). So if you configured metrics for HPA such as cpu or memory, then the HPA will scale by the real time metric it observed.
With EHPA, users can configure CronMetric、PredictionMetric、OriginalMetric at the same time.
We highly recomend you configure metrics of all dimensions. They are represtenting the cron replicas, prior predicted replicas, posterior observed replicas.
This is a powerful feature. Because HPA always pick the largest replicas calculated by all dimensional metrics to scale. Which will gurantee your workload’s QOS, when you configure three types of autoscaling at the same time, the replicas caculated by real metric observed is largest, then it will use the max one. Although the replicas caculated by prediction metric is smaller for some unexpected reason. So you don’t be worried about the QOS.
When metrics adapter deal with the external cron metric requests, metrics adapter will do following steps.
graph LR
A[Start] --> B{Active Cron?};
B -->|Yes| C(largest targetReplicas) --> F;
B -->|No| D{Work together with other metrics?};
D -->|Yes| G(minimum replicas) --> F;
D -->|No| H(current replicas) --> F;
F[Result workload replicas];
No active cron now, there are two cases:
Has active ones. we use the largest targetReplicas specified in cron spec. Basically, there should not be more then one active cron at the same time period, it is not a best practice.
HPA will get the cron external metric value, then it will compute the replicas by itself.
When you need to keep the workload replicas to minimum at midnight, you configured cron. And you need the HPA to get the real metric observed by metrics server to do scale based on real time observed metric. At last you configure a prediction-driven metric to do scale up early and scale down lately by predicting way.
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
metadata:
name: php-apache-multi-dimensions
spec:
# ScaleTargetRef is the reference to the workload that should be scaled.
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
maxReplicas: 100 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
# Metrics contains the specifications for which to use to calculate the desired replica count.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
# Prediction defines configurations for predict resources.
# If unspecified, defaults don't enable prediction.
prediction:
predictionWindowSeconds: 3600 # PredictionWindowSeconds is the time window to predict metrics in the future.
predictionAlgorithm:
algorithmType: dsp
dsp:
sampleInterval: "60s"
historyLength: "3d"
crons:
- name: "cron1"
description: "scale up"
start: "0 0 ? * 6"
end: "00 23 ? * 0"
targetReplicas: 100
EffectiveHorizontalPodAutoscaler is designed to be compatible with k8s native HorizontalPodAutoscaler, because we don’t reinvent the autoscaling part but take advantage of the extension from HorizontalPodAutoscaler and build a high level autoscaling CRD. EffectiveHorizontalPodAutoscaler support all abilities from HorizontalPodAutoscaler like metricSpec and behavior.
EffectiveHorizontalPodAutoscaler will continue support incoming new feature from HorizontalPodAutoscaler.
This is a yaml from EffectiveHorizontalPodAutoscaler.Status
apiVersion: autoscaling.crane.io/v1alpha1
kind: EffectiveHorizontalPodAutoscaler
status:
conditions:
- lastTransitionTime: "2021-11-30T08:18:59Z"
message: the HPA controller was able to get the target's current scale
reason: SucceededGetScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2021-11-30T08:18:59Z"
message: Effective HPA is ready
reason: EffectiveHorizontalPodAutoscalerReady
status: "True"
type: Ready
currentReplicas: 1
expectReplicas: 0
When checking the status for EffectiveHorizontalPodAutoscaler, you may see this error:
- lastTransitionTime: "2022-05-15T14:05:43Z"
message: 'the HPA was unable to compute the replica count: unable to get metric
crane_pod_cpu_usage: unable to fetch metrics from custom metrics API: TimeSeriesPrediction
is not ready. '
reason: FailedGetPodsMetric
status: "False"
type: ScalingActive
reason: Not all workload’s cpu metric are predictable, if predict your workload failed, it will show above errors.
solution:
DSP
section to know more about this algorithm.