OS APT Maintenance Role¶
Install a host-local systemd timer/service that runs apt maintenance and records durable JSON state for monitoring.
What It Does¶
This role deploys:
/usr/local/sbin/os-apt-maintenance: a root-owned runner forapt-get update,dist-upgrade,autoremove, andautoclean.os-apt-maintenance.service: a rootoneshotsystemd service.os-apt-maintenance.timer: a persistent, jittered systemd timer./var/lib/os-apt-maintenance/state.json: durable run state updated atomically even when apt fails.Optional
os-apt-maintenance-endpoint.service: an authenticated HTTP endpoint for Nyxmonjson-metricschecks.
The role is intentionally limited to OS package maintenance. It does not replace application dependency upgrades, product upgrades, or FastDeploy ad-hoc apt runners.
Safety Defaults¶
Automatic reboot is disabled by default.
The timer uses
Persistent=trueso missed runs catch up after downtime.The timer includes
RandomizedDelaySecto avoid synchronized apt runs.The runner uses a non-blocking lock file to prevent overlapping runs.
A failed run still writes
state.jsonand preserves the previouslast_success_at.Package config prompts use
--force-confold, so unattended runs keep existing local config files.
Variables¶
Variable |
Default |
Description |
|---|---|---|
|
|
Enable role deployment. |
|
|
Stable host identifier written to state JSON. |
|
|
State directory. |
|
|
Durable state JSON path. |
|
|
Run |
|
|
Run |
|
|
Run |
|
|
Run |
|
|
Reboot automatically after a successful run if |
|
|
Timer schedule. |
|
|
Timer jitter. |
|
|
Catch up missed timer runs after downtime. |
|
|
Run the apt maintenance service during role deploy. |
|
|
Run the apt maintenance service when the state file did not exist before this deploy. |
|
|
Monitoring threshold for last successful run, default 14 days. |
|
|
Serve state JSON over authenticated HTTP. |
|
|
Local system user that serves the endpoint and reads state. |
|
`{{ tailscale_ip |
default(‘127.0.0.1’) }}` |
|
|
Endpoint port. |
|
|
Endpoint path. |
|
|
Basic auth username. Required when endpoint is enabled. |
|
|
Basic auth password. Required when endpoint is enabled. |
Example¶
- name: Deploy OS apt maintenance
hosts: ubuntu_hosts
become: true
roles:
- role: local.ops_library.os_apt_maintenance
vars:
os_apt_maintenance_host_id: "{{ inventory_hostname }}"
os_apt_maintenance_timer_on_calendar: "Sun *-*-* 04:00:00"
os_apt_maintenance_timer_randomized_delay_sec: "4h"
os_apt_maintenance_auto_reboot: false
os_apt_maintenance_endpoint_enabled: true
os_apt_maintenance_endpoint_bind: "{{ tailscale_ip }}"
os_apt_maintenance_endpoint_auth_user: nyxmon
os_apt_maintenance_endpoint_auth_password: "{{ nyxmon_storage_metrics_password }}"
os_apt_maintenance_endpoint_auth_user is the HTTP Basic auth user, not the local system user.
The endpoint service runs as os_apt_maintenance_endpoint_user, which defaults to the shared
metrics account used by other monitoring endpoint roles.
In ops-control, production currently overrides the public role defaults to a weekly Sunday
timer and a wider jitter window:
os_apt_maintenance_timer_on_calendar: "Sun *-*-* 04:00:00"
os_apt_maintenance_timer_randomized_delay_sec: "4h"
os_apt_maintenance_run_on_first_deploy: true
State JSON Contract¶
The runner writes a stable JSON object with these important fields:
{
"schema_version": 1,
"host_id": "macmini",
"generated_at": "2026-05-05T04:00:00Z",
"last_run_started_at": "2026-05-05T04:00:00Z",
"last_run_finished_at": "2026-05-05T04:03:15Z",
"last_success_at": "2026-05-05T04:03:15Z",
"last_exit_code": 0,
"last_status": "success",
"last_error": null,
"reboot_required": false,
"auto_reboot_enabled": false,
"steps": {
"update_cache": {"attempted": true, "status": "success", "duration_seconds": 8.12},
"dist_upgrade": {"attempted": true, "status": "success", "duration_seconds": 92.7},
"autoremove": {"attempted": true, "status": "success", "duration_seconds": 5.4},
"autoclean": {"attempted": true, "status": "success", "duration_seconds": 1.3}
}
}
When the HTTP endpoint is enabled, it adds request-time meta and summary fields. Nyxmon should prefer:
$.summary.last_run_ok == true$.summary.last_success_fresh == true$.meta.state_file_fresh == true$.reboot_required == falseas warning or critical, depending on operator policy
The endpoint reports $.reboot_required from the live /var/run/reboot-required
marker at request time, so a successful operator reboot clears the monitoring
warning immediately even if the durable state file was last written before the
reboot. The previous state-file value is exposed as
$.meta.state_reboot_required for debugging.
During an active run, $.summary.currently_running is true. last_run_ok remains true while
the previous successful run is still fresh, so monitoring does not page during normal apt work.
Validation¶
systemctl status os-apt-maintenance.timer
systemctl cat os-apt-maintenance.service os-apt-maintenance.timer
cat /var/lib/os-apt-maintenance/state.json | jq .
journalctl -u os-apt-maintenance.service -n 100 --no-pager
# Endpoint, when enabled
curl -sS -o /dev/null -w '%{http_code}\n' http://<TAILSCALE_IP>:9106/.well-known/os-apt-maintenance
curl -sS -u "nyxmon:<password>" http://<TAILSCALE_IP>:9106/.well-known/os-apt-maintenance | jq .
Relationship To FastDeploy¶
apt_upgrade_register remains the FastDeploy manual/API path for operator-triggered apt upgrades. This role owns unattended host-local cadence and monitoring state. Both paths may coexist on the same host.
Testing¶
cd /path/to/ops-library
just test-role os_apt_maintenance
just lint-role os_apt_maintenance