Load balancing and scaling Vector on Kubernetes

Observe a single Vector pod hit its CPU ceiling and eliminate it by scaling horizontally behind an L7 load balancer, with the HPA finding its own equilibrium.

type: guide domain: platforms
by

This guide walks through observing a single Vector pod hit its CPU ceiling while parsing Apache Common Log format, then eliminating that ceiling by manually scaling horizontally behind Nginx. Then we’re going to set up automatically scaling using Kubernetes Horizontal Pod Autoscaler (HPA) find its own equilibrium.

All steps are reproducible using the manifests and Helm values in this repository.

Background

Vector’s parse_regex! transform is CPU-bound: for every incoming log line it executes a compiled Rust regex, allocates capture-group values, and writes a structured event downstream. A single Vector pod limited to 1 vCPU will saturate that core under sustained parallel HTTP load due to the regex parsing.

When saturation is reached, Vector applies backpressure rather than dropping events. The HTTP source stops accepting new requests; Nginx stalls the load generator’s connections.

Test environment

The benchmark was measured on a K3s single-node cluster on an EC2 c5.4xlarge (16 vCPU, 32 GiB RAM). A single-node cluster was chosen so that latency and network overhead are not a factor and collected metrics are precise.

  • Load generator: lading, generating apache_common log lines at a configurable byte rate. It maintains persistent parallel connections and is capable of sustained high-throughput HTTP load.
  • Load level: 65 MiB/s is used across all tests to get comparable throughput measurements.
  • Vector pod resources: 1 vCPU / 1 GiB, with requests == limits (Guaranteed QoS), so CPU throttling, not memory pressure or scheduling variance, is the only bottleneck under test.

Architecture

1 × lading pod  (100 parallel connections, 65 MiB/s)
        │  HTTP POST → ingress-nginx ClusterIP :80
   Nginx ingress controller  (L7 round-robin per request)
        ▼ (1, 3, or 8 pods depending on phase)
   Vector pod(s)  (1 vCPU / 1 GiB each)
   ┌──────────────────────────────────────┐
   │ source:    http_server :9000         │
   │ transform: parse_regex! (apache_clf) │
   │ sink:      socket TCP → consumer     │
   └──────────────────────────────────────┘
        │  TCP → consumer Service
   consumer pod  (socat -u, drains to /dev/null)

Why HTTP + L7 load balancing?

A plain TCP connection has no request boundary: once a client is connected to a pod, a Kubernetes ClusterIP Service (which load-balances at L4) has no opportunity to redistribute that traffic to a newly scaled-up pod. HTTP defines a request boundary, so an L7 load balancer like Nginx can dispatch each request independently, letting new pods pick up load as soon as they’re Ready.

A similar setup using HAProxy in TCP mode has the same problem: it load-balances at the connection level, so a single producer’s connection stays pinned to one consumer for its lifetime, and can leave some consumers starved of data entirely.

This is we install an Nginx ingress in front of Vector instead of exposing it through a plain ClusterIP Service.

Prerequisites

  • kubectl configured against a target cluster
  • helm ≥ 3.0
  • Cluster nodes with at least 1 allocatable CPU per Vector pod
  • grpcurl for metric collection

How the metrics are collected

Each Vector pod exposes ObservabilityService on port 8686 (gRPC). The measurement approach used for every phase below is: port-forward to a pod, take two GetComponents samples 30 s apart, and diff receivedBytesTotal on the in source component to get a per-pod throughput rate. Per-pod CPU is read via kubectl top pods and averaged across all Vector pods.

For example, against a single pod:

kubectl port-forward -n vector-perf pod/<pod-name> 18686:8686 &

grpcurl -plaintext -d '{}' localhost:18686 \
  vector.observability.v1.ObservabilityService/GetComponents > t0.json
sleep 30
grpcurl -plaintext -d '{}' localhost:18686 \
  vector.observability.v1.ObservabilityService/GetComponents > t30.json

Diffing receivedBytesTotal for the in component between t0.json and t30.json, then dividing by 30 s, gives that pod’s throughput.

See Replicating these results below for a link to the script that automates this.

Setup

Create the namespace and the consumer that drains everything Vector forwards to it:

namespace.yaml View on GitHub
apiVersion: v1
kind: Namespace
metadata:
  name: vector-perf
consumer.yaml View on GitHub
---
# Drain all incoming TCP data as fast as possible — the consumer's only job is
# to keep accepting connections so Vector's sink does not back up.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: consumer
  namespace: vector-perf
spec:
  replicas: 1
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      containers:
        - name: consumer
          image: alpine/socat:latest
          command:
            - socat
            - -u
            - TCP4-LISTEN:9000,fork,reuseaddr
            - /dev/null
          ports:
            - containerPort: 9000
              name: tcp-in
          resources:
            requests:
              cpu: "500m"
              memory: "64Mi"
            limits:
              cpu: "2000m"
              memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: consumer
  namespace: vector-perf
spec:
  selector:
    app: consumer
  ports:
    - port: 9000
      targetPort: 9000
      protocol: TCP
      name: tcp-in
kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/consumer.yaml

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  -n ingress-nginx --create-namespace \
  --set controller.service.type=ClusterIP \
  --set controller.replicaCount=1 \
  --wait --timeout=3m

helm repo add vectordotdev https://helm.vector.dev
helm repo update

Phase 1 — Single pod

Vector is installed with the shared base Helm values, which configure the http_server source, the parse_regex! transform, and the socket sink to the consumer:

values.yaml View on GitHub
# Helm values for Vector, used across all phases of the k8s-autoscaling guide:
#
#   helm upgrade --install vector vectordotdev/vector \
#     --namespace vector-perf \
#     -f scenarios/base/values.yaml \
#     --set replicas=<n>
#
# Only the initial install (replicas=1) goes through Helm. After that, every
# phase scales the Deployment directly via `kubectl scale` / `kubectl
# autoscale`, so the live replica count drifts from this file — a later
# `helm upgrade` without --set replicas would scale it back down to 1.

role: Stateless-Aggregator

resources:
  requests:
    cpu: "1000m"
    memory: "1Gi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

customConfig:
  data_dir: /vector-data-dir

  api:
    enabled: true
    address: "0.0.0.0:8686"

  sources:
    in:
      type: http_server
      address: "0.0.0.0:9000"
      framing:
        method: newline_delimited
      decoding:
        codec: bytes

  transforms:
    apache_parser:
      inputs: ["in"]
      type: remap
      source: |
        . = parse_regex!(.message, r'^(?P<host>[\w\.]+) - (?P<user>[\w\-]+) \[(?P<timestamp>[^\]]*)\] "(?P<method>[\w]+) (?P<path>[^"]*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$')

  sinks:
    out:
      inputs: ["apache_parser"]
      type: socket
      mode: tcp
      address: "consumer:9000"
      encoding:
        codec: text

containerPorts:
  - name: http-in
    containerPort: 9000
    protocol: TCP
  - name: api
    containerPort: 8686
    protocol: TCP

service:
  enabled: true
  type: ClusterIP
  ports:
    - port: 9000
      protocol: TCP
      targetPort: 9000
      name: http-in
    - port: 8686
      protocol: TCP
      targetPort: 8686
      name: api
helm upgrade --install vector vectordotdev/vector --namespace vector-perf -f scenarios/base/values.yaml --set replicas=1

kubectl apply -f manifests/ingress.yaml
kubectl apply -f manifests/producer.yaml

The ingress routes HTTP POSTs to the Vector service at the request level (L7), which is what lets the HPA find equilibrium in Phase 4:

ingress.yaml View on GitHub
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: vector-http
  namespace: vector-perf
  annotations:
    # Allow arbitrarily large POST bodies (100 MiB chunks from producers)
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    # Stream request body directly to the upstream without buffering so
    # backpressure propagates back to the producer curl client
    nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
spec:
  ingressClassName: nginx
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: vector
                port:
                  number: 9000

The producer is lading, configured to generate apache_common log lines at 65 MiB/s across 100 parallel connections:

producer.yaml View on GitHub
---
# Lading load generator — generates apache_common log lines and POSTs them via
# HTTP to the Nginx ingress controller with high parallelism.
#
# Each pod opens `parallel_connections` concurrent HTTP connections to Nginx,
# which round-robins each request to a Vector pod at the HTTP layer (L7).
# Unlike the previous sequential-curl producer, lading keeps all connections
# busy simultaneously, so it saturates as many Vector pods as the cluster has.
#
# bytes_per_second is set above what 1–3 Vector pods can handle (bottlenecked in
# Phases 1–2) but below what 8 pods can handle, so Phase 3 has headroom and the
# HPA can find a natural equilibrium in Phase 4 (~5 pods at 70% CPU).
apiVersion: v1
kind: ConfigMap
metadata:
  name: lading-config
  namespace: vector-perf
data:
  lading.yaml: |
    generator:
      - http:
          seed: [2, 3, 5, 7, 11, 13, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
                 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137]
          target_uri: "http://ingress-nginx-controller.ingress-nginx:80/"
          bytes_per_second: "65 MiB"
          parallel_connections: 100
          method:
            post:
              variant: "apache_common"
              maximum_prebuild_cache_size_bytes: "256 MiB"
          headers: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: producer
  namespace: vector-perf
spec:
  replicas: 1
  selector:
    matchLabels:
      app: producer
  template:
    metadata:
      labels:
        app: producer
    spec:
      containers:
        - name: producer
          image: ghcr.io/datadog/lading:0.32.0
          args:
            - "--config-path"
            - "/config/lading.yaml"
            - "--no-target"
            - "--capture-path"
            - "/tmp/captures"
            - "--experiment-duration-infinite"
          resources:
            requests:
              cpu: "200m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          volumeMounts:
            - name: config
              mountPath: /config
      volumes:
        - name: config
          configMap:
            name: lading-config

65 MiB/s is expected to overwhelm a single pod’s regex-parsing capacity, so Vector should back-pressure lading down to whatever it can actually process.

MetricValue
Throughput19.04 MiB/s
Events/s149,710 ev/s
Pod CPU1000m (100 %)
BottleneckVector CPU

The pod is pinned at its 1000m CPU limit and throughput tops out at 19.04 MiB/s, confirming the expected CPU ceiling. That per-pod figure is the baseline the next two phases are measured against.

Phase 2 — Three pods

kubectl scale deployment vector -n vector-perf --replicas=3
kubectl rollout status deployment/vector -n vector-perf

65 MiB/s > 3 × 19 MiB/s = 57 MiB/s combined capacity. All three pods are still fully saturated. Adding pods increases throughput, but the ceiling hasn’t been reached yet.

MetricValue
Throughput55.38 MiB/s
Events/s435,543 ev/s
Pod CPU~1000m (99 %)
Scaling vs Phase 12.91×
BottleneckVector CPU

Phase 3 — Eight pods

kubectl scale deployment vector -n vector-perf --replicas=8
kubectl rollout status deployment/vector -n vector-perf

8 × 19 MiB/s = 152 MiB/s combined capacity » 65 MiB/s. Vector is no longer the bottleneck; all 65 MiB/s flows through and pods have ample headroom.

MetricValue
Throughput62.32 MiB/s
Events/s490,288 ev/s
Pod CPU~480m (48 %)
BottleneckNone, spare capacity

The bottleneck has been eliminated. Each pod handles ~7.8 MiB/s at ~48 % CPU, leaving over half of each pod’s capacity unused. With L7 per-request routing, load is distributed evenly across all 8 pods.

Comparison

All phases: 65 MiB/s lading (100 parallel connections, Nginx L7 ingress), pods limited to 1 vCPU / 1 GiB.

Phase 1 (1 pod)Phase 2 (3 pods)Phase 3 (8 pods)
Throughput19.04 MiB/s55.38 MiB/s62.32 MiB/s
Events/s149,710435,543490,288
CPU per pod1000m (100 %)~1000m (99 %)~480m (48 %)
BottleneckVector CPUVector CPUNone
Scaling vs Phase 12.91×3.27×

The throughput ceiling is reached somewhere between 3 and 8 pods, at exactly 65 / 19 ≈ 3.4 pods. Phase 4 confirms this: the HPA converges at 6 pods.

Phase 4 — HPA finds equilibrium

With horizontal scaling working and the bottleneck removed, the HPA can now find the minimum pod count that keeps CPU below the 70 % target.

# Reset to 1 pod
kubectl scale deployment vector -n vector-perf --replicas=1

# Create HPA (70 % CPU target, 1–8 replicas)
kubectl autoscale deployment vector -n vector-perf \
  --cpu-percent=70 --min=1 --max=8

Phase 4 results

Scale-up timeline (no manual intervention):

TimeReplicasAvg CPUEvent
t=0 s1100 %load starts
t=61 s2100 %HPA scales 1→2
t=91 s399 %HPA scales 2→3
t=137 s595 %HPA scales 3→5
t=167 s672 %HPA scales 5→6
t=182 s663 %Stable, equilibrium

Time to equilibrium: 182 s (~3 min), 4 scale events, 0 manual cycling.

Throughput at equilibrium: 62.76 MiB/s, 493,392 ev/s, 6 pods, 63 % avg CPU.

The HPA settled at 6 pods: at 5 pods CPU reached 83 % (above the 70 % target), triggering a final scale-up. At 6 pods CPU stabilised at 63 %, within the ±10 % tolerance band (63–77 %).

Results summary

Phase 1 (1 pod)Phase 2 (3 pods)Phase 3 (8 pods)Phase 4 (HPA)
Throughput19.04 MiB/s55.38 MiB/s62.32 MiB/s62.76 MiB/s
Events/s149,710435,543490,288493,392
CPU per pod1000m (100 %)~1000m (99 %)~480m (48 %)~630m (63 %)
BottleneckVector CPUVector CPUNoneNone
Scaling vs Phase 12.91×3.27×3.29×
Pod countmanual (1)manual (3)manual (8)auto (6)

Phase 4 reaches Phase 3’s throughput with 2 fewer pods and no manual scaling: the HPA found 6 pods, close to the theoretical 3.4 pod crossover, and kept CPU near its 70 % target instead of leaving ~50 % headroom idle on every pod.

Key takeaways

  1. A single pod caps at its CPU limit. At 65 MiB/s load, 1 pod can absorb only ~19 MiB/s. Back-pressure prevents any event loss.

  2. L7 per-request routing distributes load uniformly. Because Nginx dispatches each HTTP request independently, every pod, old or newly Ready, receives a share of traffic proportional to the current replica count, with no idle pods.

  3. Adding pods beyond the saturation point removes the bottleneck entirely. Phase 3 (8 pods) delivers the full 65 MiB/s with each pod at ~48 % CPU. The bottleneck crossover is at ~3.4 pods for this load level.

  4. HPA finds the right pod count automatically. With HTTP + L7 routing, every new pod starts receiving traffic immediately after becoming Ready. HPA converged at 6 pods in 182 s with zero manual intervention.


Replicating these results

The terraform/ directory provisions the K3s single-node cluster (EC2 c5.4xlarge) the benchmark above was measured on, if you don’t already have a cluster to test against.

Once the Setup steps are complete and Phase 1’s producer and ingress are deployed, run-experiment.sh runs all four phases end to end: scaling the deployment, waiting for each rollout, measuring throughput, and creating the HPA for Phase 4, then prints a single results table.

run-experiment.sh View on GitHub
#!/usr/bin/env bash
# run-experiment.sh — run all 4 Vector scaling phases and print a results table.
#
# Usage:
#   KUBECONFIG=/path/to/kubeconfig ./scripts/run-experiment.sh
#
# Requirements: kubectl, grpcurl, python3
# The script assumes namespace, consumer, ingress-nginx, and ingress are already deployed.

set -euo pipefail

NAMESPACE=vector-perf
PRODUCER_MANIFEST=manifests/producer.yaml
TMPDIR_WORK=/tmp/vec-experiment-$$
mkdir -p "$TMPDIR_WORK"
trap 'rm -rf "$TMPDIR_WORK"; pkill -f "kubectl port-forward.*vector-perf.*pod/" 2>/dev/null || true' EXIT

# ── helpers ───────────────────────────────────────────────────────────────────
log() { echo "==> $*" >&2; }

# K3s kubeconfig uses client-certificate auth — no AWS credentials needed.
kube() { kubectl "$@"; }

wait_rollout() {
  kube rollout status deployment/vector -n "$NAMESPACE" --timeout=120s >&2
}

delete_hpa() {
  kube delete hpa vector -n "$NAMESPACE" 2>/dev/null || true
}

pick_pod() {
  kube get pods -n "$NAMESPACE" -l app.kubernetes.io/name=vector \
    --field-selector=status.phase=Running \
    -o jsonpath='{.items[0].metadata.name}'
}

# Average CPU % across all Vector pods via kubectl top. Outputs e.g. "97%".
avg_cpu_pct() {
  kube top pods -n "$NAMESPACE" -l app.kubernetes.io/name=vector \
    --no-headers 2>/dev/null \
    | awk '{gsub("m","",$2); sum+=$2; n++} END {
        if (n>0) printf "%d%%", int(sum/n/10)
        else     print "?"
      }'
}

# Measure throughput from one pod over a 30-second window.
# Writes "<MiB/s> <ev/s>" to $TMPDIR_WORK/measure.txt
measure_pod() {
  local pod=$1 port=$2

  kube port-forward -n "$NAMESPACE" "pod/$pod" "${port}:8686" > "$TMPDIR_WORK/pf.log" 2>&1 &
  local pf_pid=$!

  # Wait up to 10 s for the gRPC health check to pass.
  local i=0
  while ! grpcurl -plaintext "localhost:${port}" grpc.health.v1.Health/Check >/dev/null 2>&1; do
    if ! kill -0 "$pf_pid" 2>/dev/null; then
      log "ERROR: port-forward to pod/${pod}:8686 → ${port} died. Output:"
      cat "$TMPDIR_WORK/pf.log" >&2
      exit 1
    fi
    i=$(( i + 1 ))
    if [[ "$i" -ge 20 ]]; then
      log "ERROR: gRPC health check on port ${port} not ready after 10 s."
      cat "$TMPDIR_WORK/pf.log" >&2
      exit 1
    fi
    sleep 0.5
  done

  if ! grpcurl -plaintext -d '{}' "localhost:${port}" \
      vector.observability.v1.ObservabilityService/GetComponents \
      > "$TMPDIR_WORK/t0.json" 2>&1; then
    log "ERROR: grpcurl failed on port ${port} (pod=${pod}). Output:"
    cat "$TMPDIR_WORK/t0.json" >&2
    exit 1
  fi
  sleep 30
  if ! grpcurl -plaintext -d '{}' "localhost:${port}" \
      vector.observability.v1.ObservabilityService/GetComponents \
      > "$TMPDIR_WORK/t30.json" 2>&1; then
    log "ERROR: grpcurl failed on port ${port} (pod=${pod}). Output:"
    cat "$TMPDIR_WORK/t30.json" >&2
    exit 1
  fi

  kill "$pf_pid" 2>/dev/null
  wait "$pf_pid" 2>/dev/null || true

  python3 - "$TMPDIR_WORK/t0.json" "$TMPDIR_WORK/t30.json" <<'PYEOF'
import json, sys

def get_bytes_events(path):
    try:
        d = json.load(open(path))
    except Exception:
        return 0, 0
    for c in d.get('components', []):
        if c.get('componentId') == 'in':
            m = c.get('metrics', {})
            return int(m.get('receivedBytesTotal', 0)), int(m.get('receivedEventsTotal', 0))
    return 0, 0

b1, e1 = get_bytes_events(sys.argv[1])
b2, e2 = get_bytes_events(sys.argv[2])
mibps = (b2 - b1) / 30 / 1048576
eps   = (e2 - e1) / 30
print(f"{mibps:.2f} {eps:.0f}")
PYEOF
}

# ── phase runners ─────────────────────────────────────────────────────────────
# Each function writes key=value lines to $TMPDIR_WORK/phaseN.txt
run_static_phase() {
  local phase=$1 replicas=$2 out="$TMPDIR_WORK/phase${1}.txt"

  log "Phase $phase: scaling Vector to $replicas pod(s)..."
  delete_hpa
  kube scale deployment vector -n "$NAMESPACE" --replicas="$replicas" >/dev/null 2>&1
  wait_rollout

  local pod
  pod=$(pick_pod)
  log "Phase $phase: measuring $pod (20 s warmup + 30 s window)..."
  sleep 20

  local port=$((18680 + replicas))
  measure_pod "$pod" "$port" > "$TMPDIR_WORK/measure.txt"
  local mibps_per_pod eps_per_pod
  read -r mibps_per_pod eps_per_pod < "$TMPDIR_WORK/measure.txt"

  local total_mibps total_eps cpu
  total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')")
  total_eps=$(python3    -c "print(f'{float(\"$eps_per_pod\")   * $replicas:.0f}')")
  cpu=$(avg_cpu_pct)

  {
    echo "PHASE${phase}_MIBPS=${total_mibps}"
    echo "PHASE${phase}_EPS=${total_eps}"
    echo "PHASE${phase}_CPU=${cpu}"
    echo "PHASE${phase}_PODS=${replicas}"
  } > "$out"
}

run_hpa_phase() {
  local out="$TMPDIR_WORK/phase4.txt"

  log "Phase 4: resetting to 1 pod and creating HPA (70% target, max 8)..."
  delete_hpa
  kube scale deployment vector -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
  kube autoscale deployment vector -n "$NAMESPACE" \
    --cpu-percent=70 --min=1 --max=8 >/dev/null 2>&1

  local start elapsed
  local last_replicas=1 scale_events=0 stable_count=0 last_stable=0
  local replicas="" cpu_avg=""
  start=$(date +%s)

  log "Phase 4: watching HPA..."
  while true; do
    elapsed=$(( $(date +%s) - start ))

    replicas=$(kube get hpa vector -n "$NAMESPACE" \
               -o jsonpath='{.status.currentReplicas}' 2>/dev/null || echo "")
    cpu_avg=$(kube get hpa vector -n "$NAMESPACE" \
               -o jsonpath='{.status.currentMetrics[0].resource.current.averageUtilization}' \
               2>/dev/null || echo "")

    if [[ -n "$replicas" && "$replicas" != "$last_replicas" ]]; then
      scale_events=$(( scale_events + 1 ))
      log "[${elapsed}s] SCALE ${last_replicas}${replicas}  cpu=${cpu_avg}%"
      last_replicas=$replicas
    else
      log "[${elapsed}s] replicas=${replicas:-?}  cpu=${cpu_avg:-?}%"
    fi

    if [[ "$replicas" == "$last_stable" ]]; then
      stable_count=$(( stable_count + 1 ))
    else
      last_stable=$replicas
      stable_count=1
    fi

    # Stable = same replica count for 75 s AND cpu within HPA tolerance band (63–77%)
    if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then
      if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then
        log "Equilibrium: $replicas pods, ${cpu_avg}% CPU, ${elapsed}s elapsed."
        break
      fi
    fi

    sleep 15
  done

  log "Phase 4: measuring equilibrium throughput..."
  local pod
  pod=$(pick_pod)
  measure_pod "$pod" 28686 > "$TMPDIR_WORK/measure.txt"
  local mibps_per_pod eps_per_pod total_mibps total_eps
  read -r mibps_per_pod eps_per_pod < "$TMPDIR_WORK/measure.txt"
  total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $last_replicas:.2f}')")
  total_eps=$(python3    -c "print(f'{float(\"$eps_per_pod\")   * $last_replicas:.0f}')")

  {
    echo "PHASE4_MIBPS=${total_mibps}"
    echo "PHASE4_EPS=${total_eps}"
    echo "PHASE4_PODS=${last_replicas}"
    echo "PHASE4_CPU=${cpu_avg}%"
    echo "PHASE4_SCALE_EVENTS=${scale_events}"
    echo "PHASE4_ELAPSED=${elapsed}s"
  } > "$out"
}

# ── main ──────────────────────────────────────────────────────────────────────
log "Cleaning up any leftover port-forwards from previous runs..."
pkill -f "kubectl port-forward.*vector-perf.*pod/" 2>/dev/null || true
sleep 1

log "Checking cluster connectivity..."
if ! kubectl cluster-info --request-timeout=5s >/dev/null 2>&1; then
  echo "ERROR: cannot reach the cluster. Is KUBECONFIG set correctly?" >&2
  echo "  KUBECONFIG=${KUBECONFIG:-<unset>}" >&2
  exit 1
fi
log "Cluster reachable."

log "Applying producer manifest (lading, 65 MiB/s)..."
kube apply -f "$PRODUCER_MANIFEST" >/dev/null 2>&1
kube scale deployment producer -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
kube rollout restart deployment producer -n "$NAMESPACE" >/dev/null 2>&1
log "Waiting 20 s for lading to initialise..."
sleep 20

run_static_phase 1 1
run_static_phase 2 3
run_static_phase 3 8
run_hpa_phase

# Load all results
declare -A R
for f in "$TMPDIR_WORK"/phase*.txt; do
  while IFS='=' read -r k v; do R[$k]=$v; done < "$f"
done

# ── results table ─────────────────────────────────────────────────────────────
echo ""
echo "┌──────────────┬──────────────┬──────────────┬──────────────┬─────────────┐"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "" "Phase 1" "Phase 2" "Phase 3" "Phase 4"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "" "1 pod" "3 pods" "8 pods" "HPA (auto)"
echo "├──────────────┼──────────────┼──────────────┼──────────────┼─────────────┤"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "Throughput" \
  "${R[PHASE1_MIBPS]:-?} MiB/s" \
  "${R[PHASE2_MIBPS]:-?} MiB/s" \
  "${R[PHASE3_MIBPS]:-?} MiB/s" \
  "${R[PHASE4_MIBPS]:-?} MiB/s"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "Events/s" \
  "${R[PHASE1_EPS]:-?}" \
  "${R[PHASE2_EPS]:-?}" \
  "${R[PHASE3_EPS]:-?}" \
  "${R[PHASE4_EPS]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "Avg CPU/pod" \
  "${R[PHASE1_CPU]:-?}" \
  "${R[PHASE2_CPU]:-?}" \
  "${R[PHASE3_CPU]:-?}" \
  "${R[PHASE4_CPU]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "Pods" \
  "${R[PHASE1_PODS]:-?}" \
  "${R[PHASE2_PODS]:-?}" \
  "${R[PHASE3_PODS]:-?}" \
  "${R[PHASE4_PODS]:-?}"
printf "│ %-12s │ %-12s │ %-12s │ %-12s │ %-11s │\n" \
  "Bottleneck" \
  "Vector CPU" "Vector CPU" "None" "— "
echo "└──────────────┴──────────────┴──────────────┴──────────────┴─────────────┘"
echo ""
echo "Phase 4: ${R[PHASE4_SCALE_EVENTS]:-?} scale events," \
     "equilibrium in ${R[PHASE4_ELAPSED]:-?}," \
     "0 manual producer restarts."
KUBECONFIG=/path/to/kubeconfig ./scripts/run-experiment.sh