fix(agent): generate systemd-safe bwrap ExecStart #11

Merged
gmackie merged 1 commit from fix/agent-systemd-unit-quoting into main 2026-06-08 09:12:08 +00:00
Owner

Problem (P1)

Sandboxed (bwrap/netns) deploys failed at the systemctl restart step with Unit … has a bad unit file setting. systemd's command lexer rejected the generated ExecStart as Unbalanced quoting and dropped the directive, so the unit had no ExecStart and never started.

Root cause: the agent emitted an inline /bin/bash -c 'exec 11< <(printf "%s" "…") … bwrap --file 11 /etc/resolv.conf …' wrapper verbatim into ExecStart. The embedded multiline file content, unescaped ", and literal %s were parsed by systemd before bash ever saw them.

Fix

  • Drop the bash -c + FD-passing (exec NN< <(printf …)bwrap --file NN) entirely; bind /etc/resolv.conf, /etc/hosts, /etc/passwd directly via --ro-bind-try so no multiline content reaches the unit file.
  • Route every argv element through systemdExecArg (doubles %%%, escapes quotes/backslashes/whitespace).
  • Absolute /usr/bin/nsenter; real uid/gid via usermgr.AppUserIDs; working resolver via HostResolvConfPath; rewrite loopback URLs in env for the sandbox netns; iptables INPUT rule for host access.

Verification

  • go test ./internal/sandbox/... ./internal/deploy/... ./internal/usermgr/... → ok
  • go build ./... → clean
  • TestBwrapExecLine now asserts the exec line embeds no printf/multiline content.

Fixes the P1 in docs/field-reports/2026-06-08-agent-systemd-unit-quoting-bug.md.

Note: the agent binary is deployed to nodes manually (no auto-update), so nodes still run the broken agent until rebuilt + redeployed after merge.

## Problem (P1) Sandboxed (bwrap/netns) deploys failed at the `systemctl restart` step with `Unit … has a bad unit file setting`. systemd's command lexer rejected the generated `ExecStart` as `Unbalanced quoting` and dropped the directive, so the unit had no `ExecStart` and never started. Root cause: the agent emitted an inline `/bin/bash -c 'exec 11< <(printf "%s" "…") … bwrap --file 11 /etc/resolv.conf …'` wrapper verbatim into `ExecStart`. The embedded multiline file content, unescaped `"`, and literal `%s` were parsed by systemd before bash ever saw them. ## Fix - Drop the `bash -c` + FD-passing (`exec NN< <(printf …)` → `bwrap --file NN`) entirely; bind `/etc/resolv.conf`, `/etc/hosts`, `/etc/passwd` directly via `--ro-bind-try` so no multiline content reaches the unit file. - Route every argv element through `systemdExecArg` (doubles `%`→`%%`, escapes quotes/backslashes/whitespace). - Absolute `/usr/bin/nsenter`; real uid/gid via `usermgr.AppUserIDs`; working resolver via `HostResolvConfPath`; rewrite loopback URLs in env for the sandbox netns; iptables INPUT rule for host access. ## Verification - `go test ./internal/sandbox/... ./internal/deploy/... ./internal/usermgr/...` → ok - `go build ./...` → clean - `TestBwrapExecLine` now asserts the exec line embeds no `printf`/multiline content. Fixes the P1 in `docs/field-reports/2026-06-08-agent-systemd-unit-quoting-bug.md`. **Note:** the agent binary is deployed to nodes manually (no auto-update), so nodes still run the broken agent until rebuilt + redeployed after merge.
fix(agent): generate systemd-safe bwrap ExecStart
Some checks failed
AI Code Review / review (pull_request) Failing after 8s
CI / ci (pull_request) Successful in 14m12s
4ca0264e95
Replace the inline 'bash -c' FD-passing wrapper (exec NN< <(printf ...)
fed to bwrap --file NN) with direct --ro-bind-try binds of
/etc/resolv.conf, /etc/hosts, and /etc/passwd. The old wrapper embedded
multiline file content and unescaped quotes/%s directly into ExecStart,
which systemd's own command lexer rejected as 'Unbalanced quoting',
dropping the directive and failing every sandboxed deploy with 'bad unit
file setting'.

Route every argv element through systemdExecArg (doubles % -> %%,
escapes quotes/backslashes/whitespace), use an absolute /usr/bin/nsenter,
resolve real uid/gid via usermgr.AppUserIDs, pick a working resolver via
HostResolvConfPath, rewrite loopback URLs in env for the sandbox netns,
and allow host access with an iptables INPUT rule.

Fixes the P1 in docs/field-reports/2026-06-08-agent-systemd-unit-quoting-bug.md.
gmackie deleted branch fix/agent-systemd-unit-quoting 2026-06-08 09:12:08 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
gmackie/ForgeGraph!11
No description provided.