Load balancing and scaling Vector on Kubernetes
Observe a single Vector pod hit its CPU ceiling and eliminate it by scaling horizontally behind an L7 load balancer, with the HPA finding its own equilibrium.
This guide walks through observing a single Vector pod hit its CPU ceiling while parsing Apache Common Log format, then eliminating that ceiling by manually scaling horizontally behind Nginx. Then we’re going to set up automatically scaling using Kubernetes Horizontal Pod Autoscaler (HPA) find its own equilibrium.
All steps are reproducible using the manifests and Helm values in this repository.
Background
Vector’s parse_regex! transform is CPU-bound: for every incoming log line it
executes a compiled Rust regex, allocates capture-group values, and writes a
structured event downstream. A single Vector pod limited to 1 vCPU will
saturate that core under sustained parallel HTTP load due to the regex
parsing.
When saturation is reached, Vector applies backpressure rather than dropping events. The HTTP source stops accepting new requests; Nginx stalls the load generator’s connections.
Test environment
The benchmark was measured on a K3s single-node cluster on an EC2 c5.4xlarge (16 vCPU, 32 GiB RAM). A single-node cluster was chosen so that latency and network overhead are not a factor and collected metrics are precise.
- Load generator: lading,
generating
apache_commonlog lines at a configurable byte rate. It maintains persistent parallel connections and is capable of sustained high-throughput HTTP load. - Load level: 65 MiB/s is used across all tests to get comparable throughput measurements.
- Vector pod resources: 1 vCPU / 1 GiB, with
requests == limits(Guaranteed QoS), so CPU throttling, not memory pressure or scheduling variance, is the only bottleneck under test.
Architecture
1 × lading pod (100 parallel connections, 65 MiB/s)
│ HTTP POST → ingress-nginx ClusterIP :80
▼
Nginx ingress controller (L7 round-robin per request)
│
▼ (1, 3, or 8 pods depending on phase)
Vector pod(s) (1 vCPU / 1 GiB each)
┌──────────────────────────────────────┐
│ source: http_server :9000 │
│ transform: parse_regex! (apache_clf) │
│ sink: socket TCP → consumer │
└──────────────────────────────────────┘
│ TCP → consumer Service
▼
consumer pod (socat -u, drains to /dev/null)
Why HTTP + L7 load balancing?
A plain TCP connection has no request boundary: once a client is connected to a pod, a Kubernetes ClusterIP Service (which load-balances at L4) has no opportunity to redistribute that traffic to a newly scaled-up pod. HTTP defines a request boundary, so an L7 load balancer like Nginx can dispatch each request independently, letting new pods pick up load as soon as they’re Ready.
A similar setup using HAProxy in TCP mode has the same problem: it load-balances at the connection level, so a single producer’s connection stays pinned to one consumer for its lifetime, and can leave some consumers starved of data entirely.
This is we install an Nginx ingress in front of Vector instead of exposing it through a plain ClusterIP Service.
Prerequisites
kubectlconfigured against a target clusterhelm≥ 3.0- Cluster nodes with at least 1 allocatable CPU per Vector pod
grpcurlfor metric collection
How the metrics are collected
Each Vector pod exposes ObservabilityService on port 8686 (gRPC). The
measurement approach used for every phase below is: port-forward to a pod,
take two GetComponents samples 30 s apart, and diff receivedBytesTotal on
the in source component to get a per-pod throughput rate. Per-pod CPU is
read via kubectl top pods and averaged across all Vector pods.
For example, against a single pod:
kubectl port-forward -n vector-perf pod/<pod-name> 18686:8686 &
grpcurl -plaintext -d '{}' localhost:18686 \
vector.observability.v1.ObservabilityService/GetComponents > t0.json
sleep 30
grpcurl -plaintext -d '{}' localhost:18686 \
vector.observability.v1.ObservabilityService/GetComponents > t30.json
Diffing receivedBytesTotal for the in component between t0.json and
t30.json, then dividing by 30 s, gives that pod’s throughput.
See Replicating these results below for a link to the script that automates this.
Setup
Create the namespace and the consumer that drains everything Vector forwards to it:
namespace.yaml
View on GitHubapiVersion: v1
kind: Namespace
metadata:
name: vector-perf
consumer.yaml
View on GitHub---
# Drain all incoming TCP data as fast as possible — the consumer's only job is
# to keep accepting connections so Vector's sink does not back up.
apiVersion: apps/v1
kind: Deployment
metadata:
name: consumer
namespace: vector-perf
spec:
replicas: 1
selector:
matchLabels:
app: consumer
template:
metadata:
labels:
app: consumer
spec:
containers:
- name: consumer
image: alpine/socat:latest
command:
- socat
- -u
- TCP4-LISTEN:9000,fork,reuseaddr
- /dev/null
ports:
- containerPort: 9000
name: tcp-in
resources:
requests:
cpu: "500m"
memory: "64Mi"
limits:
cpu: "2000m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: consumer
namespace: vector-perf
spec:
selector:
app: consumer
ports:
- port: 9000
targetPort: 9000
protocol: TCP
name: tcp-in
kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/consumer.yaml
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
-n ingress-nginx --create-namespace \
--set controller.service.type=ClusterIP \
--set controller.replicaCount=1 \
--wait --timeout=3m
helm repo add vectordotdev https://helm.vector.dev
helm repo update
Phase 1 — Single pod
Vector is installed with the shared base Helm values, which configure the
http_server source, the parse_regex! transform, and the socket sink to
the consumer:
values.yaml
View on GitHub# Helm values for Vector, used across all phases of the k8s-autoscaling guide:
#
# helm upgrade --install vector vectordotdev/vector \
# --namespace vector-perf \
# -f scenarios/base/values.yaml \
# --set replicas=<n>
#
# Only the initial install (replicas=1) goes through Helm. After that, every
# phase scales the Deployment directly via `kubectl scale` / `kubectl
# autoscale`, so the live replica count drifts from this file — a later
# `helm upgrade` without --set replicas would scale it back down to 1.
role: Stateless-Aggregator
resources:
requests:
cpu: "1000m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "1Gi"
customConfig:
data_dir: /vector-data-dir
api:
enabled: true
address: "0.0.0.0:8686"
sources:
in:
type: http_server
address: "0.0.0.0:9000"
framing:
method: newline_delimited
decoding:
codec: bytes
transforms:
apache_parser:
inputs: ["in"]
type: remap
source: |
. = parse_regex!(.message, r'^(?P<host>[\w\.]+) - (?P<user>[\w\-]+) \[(?P<timestamp>[^\]]*)\] "(?P<method>[\w]+) (?P<path>[^"]*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$')
sinks:
out:
inputs: ["apache_parser"]
type: socket
mode: tcp
address: "consumer:9000"
encoding:
codec: text
containerPorts:
- name: http-in
containerPort: 9000
protocol: TCP
- name: api
containerPort: 8686
protocol: TCP
service:
enabled: true
type: ClusterIP
ports:
- port: 9000
protocol: TCP
targetPort: 9000
name: http-in
- port: 8686
protocol: TCP
targetPort: 8686
name: api
helm upgrade --install vector vectordotdev/vector --namespace vector-perf -f scenarios/base/values.yaml --set replicas=1
kubectl apply -f manifests/ingress.yaml
kubectl apply -f manifests/producer.yaml
The ingress routes HTTP POSTs to the Vector service at the request level (L7), which is what lets the HPA find equilibrium in Phase 4:
ingress.yaml
View on GitHubapiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: vector-http
namespace: vector-perf
annotations:
# Allow arbitrarily large POST bodies (100 MiB chunks from producers)
nginx.ingress.kubernetes.io/proxy-body-size: "0"
# Stream request body directly to the upstream without buffering so
# backpressure propagates back to the producer curl client
nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
spec:
ingressClassName: nginx
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: vector
port:
number: 9000
The producer is lading, configured to
generate apache_common log lines at 65 MiB/s across 100 parallel connections:
producer.yaml
View on GitHub---
# Lading load generator — generates apache_common log lines and POSTs them via
# HTTP to the Nginx ingress controller with high parallelism.
#
# Each pod opens `parallel_connections` concurrent HTTP connections to Nginx,
# which round-robins each request to a Vector pod at the HTTP layer (L7).
# Unlike the previous sequential-curl producer, lading keeps all connections
# busy simultaneously, so it saturates as many Vector pods as the cluster has.
#
# bytes_per_second is set above what 1–3 Vector pods can handle (bottlenecked in
# Phases 1–2) but below what 8 pods can handle, so Phase 3 has headroom and the
# HPA can find a natural equilibrium in Phase 4 (~5 pods at 70% CPU).
apiVersion: v1
kind: ConfigMap
metadata:
name: lading-config
namespace: vector-perf
data:
lading.yaml: |
generator:
- http:
seed: [2, 3, 5, 7, 11, 13, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137]
target_uri: "http://ingress-nginx-controller.ingress-nginx:80/"
bytes_per_second: "65 MiB"
parallel_connections: 100
method:
post:
variant: "apache_common"
maximum_prebuild_cache_size_bytes: "256 MiB"
headers: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: producer
namespace: vector-perf
spec:
replicas: 1
selector:
matchLabels:
app: producer
template:
metadata:
labels:
app: producer
spec:
containers:
- name: producer
image: ghcr.io/datadog/lading:0.32.0
args:
- "--config-path"
- "/config/lading.yaml"
- "--no-target"
- "--capture-path"
- "/tmp/captures"
- "--experiment-duration-infinite"
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: lading-config
65 MiB/s is expected to overwhelm a single pod’s regex-parsing capacity, so Vector should back-pressure lading down to whatever it can actually process.
| Metric | Value |
|---|---|
| Throughput | 19.04 MiB/s |
| Events/s | 149,710 ev/s |
| Pod CPU | 1000m (100 %) |
| Bottleneck | Vector CPU |
The pod is pinned at its 1000m CPU limit and throughput tops out at 19.04 MiB/s, confirming the expected CPU ceiling. That per-pod figure is the baseline the next two phases are measured against.
Phase 2 — Three pods
kubectl scale deployment vector -n vector-perf --replicas=3
kubectl rollout status deployment/vector -n vector-perf
65 MiB/s > 3 × 19 MiB/s = 57 MiB/s combined capacity. All three pods are still fully saturated. Adding pods increases throughput, but the ceiling hasn’t been reached yet.
| Metric | Value |
|---|---|
| Throughput | 55.38 MiB/s |
| Events/s | 435,543 ev/s |
| Pod CPU | ~1000m (99 %) |
| Scaling vs Phase 1 | 2.91× |
| Bottleneck | Vector CPU |
Phase 3 — Eight pods
kubectl scale deployment vector -n vector-perf --replicas=8
kubectl rollout status deployment/vector -n vector-perf
8 × 19 MiB/s = 152 MiB/s combined capacity » 65 MiB/s. Vector is no longer the bottleneck; all 65 MiB/s flows through and pods have ample headroom.
| Metric | Value |
|---|---|
| Throughput | 62.32 MiB/s |
| Events/s | 490,288 ev/s |
| Pod CPU | ~480m (48 %) |
| Bottleneck | None, spare capacity |
The bottleneck has been eliminated. Each pod handles ~7.8 MiB/s at ~48 % CPU, leaving over half of each pod’s capacity unused. With L7 per-request routing, load is distributed evenly across all 8 pods.
Comparison
All phases: 65 MiB/s lading (100 parallel connections, Nginx L7 ingress), pods limited to 1 vCPU / 1 GiB.
| Phase 1 (1 pod) | Phase 2 (3 pods) | Phase 3 (8 pods) | |
|---|---|---|---|
| Throughput | 19.04 MiB/s | 55.38 MiB/s | 62.32 MiB/s |
| Events/s | 149,710 | 435,543 | 490,288 |
| CPU per pod | 1000m (100 %) | ~1000m (99 %) | ~480m (48 %) |
| Bottleneck | Vector CPU | Vector CPU | None |
| Scaling vs Phase 1 | 1× | 2.91× | 3.27× |
The throughput ceiling is reached somewhere between 3 and 8 pods, at exactly 65 / 19 ≈ 3.4 pods. Phase 4 confirms this: the HPA converges at 6 pods.
Phase 4 — HPA finds equilibrium
With horizontal scaling working and the bottleneck removed, the HPA can now find the minimum pod count that keeps CPU below the 70 % target.
# Reset to 1 pod
kubectl scale deployment vector -n vector-perf --replicas=1
# Create HPA (70 % CPU target, 1–8 replicas)
kubectl autoscale deployment vector -n vector-perf \
--cpu-percent=70 --min=1 --max=8
Phase 4 results
Scale-up timeline (no manual intervention):
| Time | Replicas | Avg CPU | Event |
|---|---|---|---|
| t=0 s | 1 | 100 % | load starts |
| t=61 s | 2 | 100 % | HPA scales 1→2 |
| t=91 s | 3 | 99 % | HPA scales 2→3 |
| t=137 s | 5 | 95 % | HPA scales 3→5 |
| t=167 s | 6 | 72 % | HPA scales 5→6 |
| t=182 s | 6 | 63 % | Stable, equilibrium |
Time to equilibrium: 182 s (~3 min), 4 scale events, 0 manual cycling.
Throughput at equilibrium: 62.76 MiB/s, 493,392 ev/s, 6 pods, 63 % avg CPU.
The HPA settled at 6 pods: at 5 pods CPU reached 83 % (above the 70 % target), triggering a final scale-up. At 6 pods CPU stabilised at 63 %, within the ±10 % tolerance band (63–77 %).
Results summary
| Phase 1 (1 pod) | Phase 2 (3 pods) | Phase 3 (8 pods) | Phase 4 (HPA) | |
|---|---|---|---|---|
| Throughput | 19.04 MiB/s | 55.38 MiB/s | 62.32 MiB/s | 62.76 MiB/s |
| Events/s | 149,710 | 435,543 | 490,288 | 493,392 |
| CPU per pod | 1000m (100 %) | ~1000m (99 %) | ~480m (48 %) | ~630m (63 %) |
| Bottleneck | Vector CPU | Vector CPU | None | None |
| Scaling vs Phase 1 | 1× | 2.91× | 3.27× | 3.29× |
| Pod count | manual (1) | manual (3) | manual (8) | auto (6) |
Phase 4 reaches Phase 3’s throughput with 2 fewer pods and no manual scaling: the HPA found 6 pods, close to the theoretical 3.4 pod crossover, and kept CPU near its 70 % target instead of leaving ~50 % headroom idle on every pod.
Key takeaways
A single pod caps at its CPU limit. At 65 MiB/s load, 1 pod can absorb only ~19 MiB/s. Back-pressure prevents any event loss.
L7 per-request routing distributes load uniformly. Because Nginx dispatches each HTTP request independently, every pod, old or newly Ready, receives a share of traffic proportional to the current replica count, with no idle pods.
Adding pods beyond the saturation point removes the bottleneck entirely. Phase 3 (8 pods) delivers the full 65 MiB/s with each pod at ~48 % CPU. The bottleneck crossover is at ~3.4 pods for this load level.
HPA finds the right pod count automatically. With HTTP + L7 routing, every new pod starts receiving traffic immediately after becoming Ready. HPA converged at 6 pods in 182 s with zero manual intervention.
Replicating these results
The terraform/
directory provisions the K3s single-node cluster (EC2 c5.4xlarge) the
benchmark above was measured on, if you don’t already have a cluster to test
against.
Once the Setup steps are complete and Phase 1’s producer and ingress
are deployed, run-experiment.sh runs all four phases end to end: scaling the
deployment, waiting for each rollout, measuring throughput, and creating the
HPA for Phase 4, then prints a single results table.
run-experiment.sh
View on GitHub
run-experiment.sh
View on GitHub#!/usr/bin/env bash
# run-experiment.sh — run all 4 Vector scaling phases and print a results table.
#
# Usage:
# KUBECONFIG=/path/to/kubeconfig ./scripts/run-experiment.sh
#
# Requirements: kubectl, grpcurl, python3
# The script assumes namespace, consumer, ingress-nginx, and ingress are already deployed.
set -euo pipefail
NAMESPACE=vector-perf
PRODUCER_MANIFEST=manifests/producer.yaml
TMPDIR_WORK=/tmp/vec-experiment-$$
mkdir -p "$TMPDIR_WORK"
trap 'rm -rf "$TMPDIR_WORK"; pkill -f "kubectl port-forward.*vector-perf.*pod/" 2>/dev/null || true' EXIT
# ── helpers ───────────────────────────────────────────────────────────────────
log() { echo "==> $*" >&2; }
# K3s kubeconfig uses client-certificate auth — no AWS credentials needed.
kube() { kubectl "$@"; }
wait_rollout() {
kube rollout status deployment/vector -n "$NAMESPACE" --timeout=120s >&2
}
delete_hpa() {
kube delete hpa vector -n "$NAMESPACE" 2>/dev/null || true
}
pick_pod() {
kube get pods -n "$NAMESPACE" -l app.kubernetes.io/name=vector \
--field-selector=status.phase=Running \
-o jsonpath='{.items[0].metadata.name}'
}
# Average CPU % across all Vector pods via kubectl top. Outputs e.g. "97%".
avg_cpu_pct() {
kube top pods -n "$NAMESPACE" -l app.kubernetes.io/name=vector \
--no-headers 2>/dev/null \
| awk '{gsub("m","",$2); sum+=$2; n++} END {
if (n>0) printf "%d%%", int(sum/n/10)
else print "?"
}'
}
# Measure throughput from one pod over a 30-second window.
# Writes "<MiB/s> <ev/s>" to $TMPDIR_WORK/measure.txt
measure_pod() {
local pod=$1 port=$2
kube port-forward -n "$NAMESPACE" "pod/$pod" "${port}:8686" > "$TMPDIR_WORK/pf.log" 2>&1 &
local pf_pid=$!
# Wait up to 10 s for the gRPC health check to pass.
local i=0
while ! grpcurl -plaintext "localhost:${port}" grpc.health.v1.Health/Check >/dev/null 2>&1; do
if ! kill -0 "$pf_pid" 2>/dev/null; then
log "ERROR: port-forward to pod/${pod}:8686 → ${port} died. Output:"
cat "$TMPDIR_WORK/pf.log" >&2
exit 1
fi
i=$(( i + 1 ))
if [[ "$i" -ge 20 ]]; then
log "ERROR: gRPC health check on port ${port} not ready after 10 s."
cat "$TMPDIR_WORK/pf.log" >&2
exit 1
fi
sleep 0.5
done
if ! grpcurl -plaintext -d '{}' "localhost:${port}" \
vector.observability.v1.ObservabilityService/GetComponents \
> "$TMPDIR_WORK/t0.json" 2>&1; then
log "ERROR: grpcurl failed on port ${port} (pod=${pod}). Output:"
cat "$TMPDIR_WORK/t0.json" >&2
exit 1
fi
sleep 30
if ! grpcurl -plaintext -d '{}' "localhost:${port}" \
vector.observability.v1.ObservabilityService/GetComponents \
> "$TMPDIR_WORK/t30.json" 2>&1; then
log "ERROR: grpcurl failed on port ${port} (pod=${pod}). Output:"
cat "$TMPDIR_WORK/t30.json" >&2
exit 1
fi
kill "$pf_pid" 2>/dev/null
wait "$pf_pid" 2>/dev/null || true
python3 - "$TMPDIR_WORK/t0.json" "$TMPDIR_WORK/t30.json" <<'PYEOF'
import json, sys
def get_bytes_events(path):
try:
d = json.load(open(path))
except Exception:
return 0, 0
for c in d.get('components', []):
if c.get('componentId') == 'in':
m = c.get('metrics', {})
return int(m.get('receivedBytesTotal', 0)), int(m.get('receivedEventsTotal', 0))
return 0, 0
b1, e1 = get_bytes_events(sys.argv[1])
b2, e2 = get_bytes_events(sys.argv[2])
mibps = (b2 - b1) / 30 / 1048576
eps = (e2 - e1) / 30
print(f"{mibps:.2f} {eps:.0f}")
PYEOF
}
# ── phase runners ─────────────────────────────────────────────────────────────
# Each function writes key=value lines to $TMPDIR_WORK/phaseN.txt
run_static_phase() {
local phase=$1 replicas=$2 out="$TMPDIR_WORK/phase${1}.txt"
log "Phase $phase: scaling Vector to $replicas pod(s)..."
delete_hpa
kube scale deployment vector -n "$NAMESPACE" --replicas="$replicas" >/dev/null 2>&1
wait_rollout
local pod
pod=$(pick_pod)
log "Phase $phase: measuring $pod (20 s warmup + 30 s window)..."
sleep 20
local port=$((18680 + replicas))
measure_pod "$pod" "$port" > "$TMPDIR_WORK/measure.txt"
local mibps_per_pod eps_per_pod
read -r mibps_per_pod eps_per_pod < "$TMPDIR_WORK/measure.txt"
local total_mibps total_eps cpu
total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')")
total_eps=$(python3 -c "print(f'{float(\"$eps_per_pod\") * $replicas:.0f}')")
cpu=$(avg_cpu_pct)
{
echo "PHASE${phase}_MIBPS=${total_mibps}"
echo "PHASE${phase}_EPS=${total_eps}"
echo "PHASE${phase}_CPU=${cpu}"
echo "PHASE${phase}_PODS=${replicas}"
} > "$out"
}
run_hpa_phase() {
local out="$TMPDIR_WORK/phase4.txt"
log "Phase 4: resetting to 1 pod and creating HPA (70% target, max 8)..."
delete_hpa
kube scale deployment vector -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
kube autoscale deployment vector -n "$NAMESPACE" \
--cpu-percent=70 --min=1 --max=8 >/dev/null 2>&1
local start elapsed
local last_replicas=1 scale_events=0 stable_count=0 last_stable=0
local replicas="" cpu_avg=""
start=$(date +%s)
log "Phase 4: watching HPA..."
while true; do
elapsed=$(( $(date +%s) - start ))
replicas=$(kube get hpa vector -n "$NAMESPACE" \
-o jsonpath='{.status.currentReplicas}' 2>/dev/null || echo "")
cpu_avg=$(kube get hpa vector -n "$NAMESPACE" \
-o jsonpath='{.status.currentMetrics[0].resource.current.averageUtilization}' \
2>/dev/null || echo "")
if [[ -n "$replicas" && "$replicas" != "$last_replicas" ]]; then
scale_events=$(( scale_events + 1 ))
log "[${elapsed}s] SCALE ${last_replicas}→${replicas} cpu=${cpu_avg}%"
last_replicas=$replicas
else
log "[${elapsed}s] replicas=${replicas:-?} cpu=${cpu_avg:-?}%"
fi
if [[ "$replicas" == "$last_stable" ]]; then
stable_count=$(( stable_count + 1 ))
else
last_stable=$replicas
stable_count=1
fi
# Stable = same replica count for 75 s AND cpu within HPA tolerance band (63–77%)
if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then
if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then
log "Equilibrium: $replicas pods, ${cpu_avg}% CPU, ${elapsed}s elapsed."
break
fi
fi
sleep 15
done
log "Phase 4: measuring equilibrium throughput..."
local pod
pod=$(pick_pod)
measure_pod "$pod" 28686 > "$TMPDIR_WORK/measure.txt"
local mibps_per_pod eps_per_pod total_mibps total_eps
read -r mibps_per_pod eps_per_pod < "$TMPDIR_WORK/measure.txt"
total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $last_replicas:.2f}')")
total_eps=$(python3 -c "print(f'{float(\"$eps_per_pod\") * $last_replicas:.0f}')")
{
echo "PHASE4_MIBPS=${total_mibps}"
echo "PHASE4_EPS=${total_eps}"
echo "PHASE4_PODS=${last_replicas}"
echo "PHASE4_CPU=${cpu_avg}%"
echo "PHASE4_SCALE_EVENTS=${scale_events}"
echo "PHASE4_ELAPSED=${elapsed}s"
} > "$out"
}
# ── main ──────────────────────────────────────────────────────────────────────
log "Cleaning up any leftover port-forwards from previous runs..."
pkill -f "kubectl port-forward.*vector-perf.*pod/" 2>/dev/null || true
sleep 1
log "Checking cluster connectivity..."
if ! kubectl cluster-info --request-timeout=5s >/dev/null 2>&1; then
echo "ERROR: cannot reach the cluster. Is KUBECONFIG set correctly?" >&2
echo " KUBECONFIG=${KUBECONFIG:-<unset>}" >&2
exit 1
fi
log "Cluster reachable."
log "Applying producer manifest (lading, 65 MiB/s)..."
kube apply -f "$PRODUCER_MANIFEST" >/dev/null 2>&1
kube scale deployment producer -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
kube rollout restart deployment producer -n "$NAMESPACE" >/dev/null 2>&1
log "Waiting 20 s for lading to initialise..."
sleep 20
run_static_phase 1 1
run_static_phase 2 3
run_static_phase 3 8
run_hpa_phase
# Load all results
declare -A R
for f in "$TMPDIR_WORK"/phase*.txt; do
while IFS='=' read -r k v; do R[$k]=$v; done < "$f"
done
# ── results table ─────────────────────────────────────────────────────────────
echo ""
echo "┌──────────────┬──────────────┬──────────────┬──────────────┬─────────────┐"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"" "Phase 1" "Phase 2" "Phase 3" "Phase 4"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"" "1 pod" "3 pods" "8 pods" "HPA (auto)"
echo "├──────────────┼──────────────┼──────────────┼──────────────┼─────────────┤"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"Throughput" \
"${R[PHASE1_MIBPS]:-?} MiB/s" \
"${R[PHASE2_MIBPS]:-?} MiB/s" \
"${R[PHASE3_MIBPS]:-?} MiB/s" \
"${R[PHASE4_MIBPS]:-?} MiB/s"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"Events/s" \
"${R[PHASE1_EPS]:-?}" \
"${R[PHASE2_EPS]:-?}" \
"${R[PHASE3_EPS]:-?}" \
"${R[PHASE4_EPS]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"Avg CPU/pod" \
"${R[PHASE1_CPU]:-?}" \
"${R[PHASE2_CPU]:-?}" \
"${R[PHASE3_CPU]:-?}" \
"${R[PHASE4_CPU]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"Pods" \
"${R[PHASE1_PODS]:-?}" \
"${R[PHASE2_PODS]:-?}" \
"${R[PHASE3_PODS]:-?}" \
"${R[PHASE4_PODS]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
"Bottleneck" \
"Vector CPU" "Vector CPU" "None" "— "
echo "└──────────────┴──────────────┴──────────────┴──────────────┴─────────────┘"
echo ""
echo "Phase 4: ${R[PHASE4_SCALE_EVENTS]:-?} scale events," \
"equilibrium in ${R[PHASE4_ELAPSED]:-?}," \
"0 manual producer restarts."
KUBECONFIG=/path/to/kubeconfig ./scripts/run-experiment.sh