Issues / #101
Pin image tags in landconfigregistry and auto-replant on registry release
open
improvement
Project: landconfigregistry
Reporter:
1 May 2026 18:47
Description
Land deployments today rely on `:latest` (or worse, an unpinned manual `docker run`), so production drifts silently from the codebase. Concrete example from issue #95: the agentclaudecode E2BIG fix landed in commit 50517f5 / tag v0.10.0 on 2026-04-02. The container on land-nimsforest-one kept running v0.9.0 image hash for a full 4 weeks afterward — nobody knew a replant was due, and the broken-build-pipeline issue (CI didn't even push a Docker image) hid that the registry's `:latest` and the deployed image had diverged.
Fix structurally:
1. Add `image_tag:` to each container entry in landconfigregistry seed data; render it into /etc/land.yaml so `land plant` and `docker pull` resolve to a specific version, not `:latest`.
2. On a successful release CI run, have the release workflow (or a release-bot) PATCH landconfigregistry to bump the tag for that service's role, then notify the affected land servers (NATS `land.config.reload.<role>` already exists per landconfigregistry CLAUDE.md).
3. Land servers re-resolve their config and replant changed containers automatically (gated by a maintenance window flag if needed).
4. Hotfix override: a humans-only `land plant <name> --image=<explicit-ref>` that skips the registry pin for emergencies.
With this in place, the agentclaudecode story would have been: tag pushed → CI builds image → CI bumps landconfigregistry → land replants → done, no human in the loop and no 4-week drift.
Relates: #95, #98 (the rendering bug found while doing the manual deploy).
Fix structurally:
1. Add `image_tag:` to each container entry in landconfigregistry seed data; render it into /etc/land.yaml so `land plant` and `docker pull` resolve to a specific version, not `:latest`.
2. On a successful release CI run, have the release workflow (or a release-bot) PATCH landconfigregistry to bump the tag for that service's role, then notify the affected land servers (NATS `land.config.reload.<role>` already exists per landconfigregistry CLAUDE.md).
3. Land servers re-resolve their config and replant changed containers automatically (gated by a maintenance window flag if needed).
4. Hotfix override: a humans-only `land plant <name> --image=<explicit-ref>` that skips the registry pin for emergencies.
With this in place, the agentclaudecode story would have been: tag pushed → CI builds image → CI bumps landconfigregistry → land replants → done, no human in the loop and no 4-week drift.
Relates: #95, #98 (the rendering bug found while doing the manual deploy).