Spreading the Load With Jitter

Each time an application's reconciliation interval elapses, the controller refreshes it. However, a refresh is not free:

For each application, the controller asks the argocd-repo-server to produce the desired manifests.
The repo-server fetches the repository and runs the config management tool for that source: helm template, kustomize build, or a plain read for raw manifests.
The repo-server then hands the rendered output back.

These operations are expensive, especially for large applications. The render step is the costly part, and it consumes CPU and memory on the argocd-repo-server.

The problem is timing, not the work itself. Without jitter, every application shares the same reconciliation tick, so they all refresh at the same moment. The repo-server gets the entire fleet's render requests in one burst instead of spread over the interval. With a handful of applications this is fine; the burst is small and clears quickly. With hundreds or thousands, the repo-server has far more render work than it can do at once. Requests queue, refreshes are delayed, and you see a sharp CPU and memory spike on the repo-server every reconciliation cycle, with the controllers idle in between. If the repo-server is memory-constrained, a large enough burst can OOM-kill it.

This is worse when many applications share one repository, a common setup with a monorepo. The repo-server caches rendered manifests per commit, so if nothing changed it can often serve a refresh from cache cheaply. But a single push to that shared repository invalidates the cache for every application backed by it. The next reconciliation tick then forces a real render for all of them at once, turning a routine refresh cycle into a full fleet-wide regeneration.

(i) Do you know that soldiers are asked to break step when crossing a bridge? Marching in sync transfers rhythmic energy to the structure, and if that rhythm matches the bridge's natural frequency, it can shake itself apart. Jitter does the same thing for Argo CD: it breaks the shared rhythm so applications do not all refresh at once.

Jitter spreads these refreshes out. It adds a random delay, between 0 and the jitter value, to each application's refresh, so they land staggered across a window instead of all at once. With a reconciliation timeout of 3 minutes and a jitter of 1 minute, each application refreshes somewhere between 3 and 4 minutes, and the delay is rolled per application.

For example, say you have 6 applications and they all last reconciled at the same moment, time 0:00. The reconciliation timeout is 3 minutes and jitter is 1 minute (60s).

Without jitter, all 6 come due at 3:00 and refresh together: one burst of 6 render requests at the repo-server.

With jitter, each application gets its own random delay between 0 and 60 seconds, rolled independently:

Application	Base interval	Jitter (random 0-60s)	Refreshes at
todo-app	3:00	+12s	3:12
api	3:00	+47s	3:47
frontend	3:00	+3s	3:03
worker	3:00	+35s	3:35
cache	3:00	+30s	3:30
db	3:00	+21s	3:21

The six refreshes now spread across the 3:00 to 4:00 window instead of all firing at 3:00. The repo-server handles them a few at a time. Each application still refreshes at least once every three minutes; jitter only pushes it later, never earlier.

Spreading the load with jitter

Jitter is configured through the timeout.reconciliation.jitter key in the argocd-cm ConfigMap, expressed as a duration. It defaults to 60s and is disabled when set to 0. Keep it at or below roughly half the reconciliation timeout; beyond that, refresh timing becomes too unpredictable.

You can get the current value of the jitter using the following command:

GitOps the Hard Way, with Argo CD

Build Real GitOps Pipelines From Empty Clusters to Automated Deploys

Enroll now to unlock all content and receive all future updates for free.

Unlock now $20.99 Learn More

Previous Next