Changelog¶
All notable changes to the ops-library collection will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
Breaking Changes¶
Python 3.14+ required - Dropped support for Python 3.8–3.13
Supports Python 3.14 (N-2 policy currently aligns with the latest stable release)
All roles and testing infrastructure now require Python 3.14+
Update your systems before upgrading to this version
ansible-core 2.20+ required - Dropped support for Ansible 2.9-2.14
ansible-core 2.20 is the minimum version compatible with Python 3.14+
Update your Ansible installation before upgrading
Added¶
voxhelm_deploycan now put transcription jobs intoremote_pullmode with validated worker-token and shared S3 artifact settings, whilevoxhelm_remote_worker_deployinstalls a pinned public-PyPIvoxhelm[diarization]worker on macOS and runs it under launchd.voxhelm_ingress_deploynow blocks/v1/internalby default at the Traefik edge, with an explicit separate allowlist for deliberately private worker routes.tailscale_metrics_endpointrole to expose authenticated Tailscale login state and node-key expiry JSON for Nyxmon monitoring.voxhelm_deploynow supports production pyannote speaker diarization wiring, including optionaluv sync --extra diarizationinstallation, protected Hugging Face token env rendering, and validation when the backend is enabled.nyxmon_storage_exporternow caches successful ZFS pool samples and reuses them during quiet-hours pool skips, keeping capacity JSON paths stable for monitoring while marking cached values explicitly.os_apt_maintenanceendpoint responses now expose$.meta.state_reboot_requiredso operators can inspect the reboot-required value from the durable state file separately from the live marker.os_apt_maintenancerole for host-local apt update/dist-upgrade/autoremove/autoclean timers with durable JSON state and an optional authenticated Nyxmon endpoint.wagtail_deploynow supports a stablewagtail_db_worker_idand passes it to Django Tasksdb_worker --worker-id, allowing each deployed site to run a distinct database-backed task workerwagtail_deploynow includes aredirect-wwwTraefik middleware that strips thewww.prefix via regex redirect (302), applied unconditionally to the HTTPS routerheadless_moderole to persist hosts on a non-graphical systemd target and disable running display-manager services without requiring a rebootpaperless_deploycan now promote existing Paperless users to active staff superusers during deploy viapaperless_existing_superusersTakahe lifecycle roles:
takahe_shared,takahe_deploy,takahe_backup,takahe_restore, andtakahe_removewith systemd services, nginx caching/accel proxy, Traefik routing, and PostgreSQL provisioningMastodon lifecycle roles:
mastodon_shared,mastodon_deploy,mastodon_backup,mastodon_restore,mastodon_maintenance, andmastodon_removewith rbenv+nvm runtimes, systemd services, Traefik routing, and backup/restore toolingopen_webui_deployandopen_webui_removeroles to run Open WebUI via Docker Compose with Traefik routing, persistent storage, and optional basic authopen_webui_venv_deployandopen_webui_venv_removeroles for a uv-managed venv deployment with systemd, Traefik routing, and persistent datazfs_syncoid_replicationrole for scheduled syncoid replication with alert hooks and optional spindown scriptzfs_usb_replicationrole for USB-attached ZFS replication with device detection and optional alertsminio_offsite_replicationrole to pull MinIO archives from a remote host into offsite storage via systemd timer, rsync/SSH, and alert hooksmail_offsite_replicationrole to pull maildir + staged DB/config artifacts from a remote host into offsite ZFS storage with post-sync snapshots, status markers, and alert hooksencrypted_volume_preparerole to verify, unlock, and mount LUKS data volumes with keyfile support, UUID validation, crypttab/fstab wiring, and a validate-only dry runnyxmon_backuprole for SQLite-safe snapshots with metadata, manifests, and automatic archive fetchesnyxmon_restorerole with staging validation, safety snapshots, rollback support, and service verificationollama_installrole to install and run Ollama on macOS via Homebrew with launchd managementollama_removerole to unload launchd, remove the plist, and optionally remove data/logs, service user, and Homebrew packagedocker_installrole to install Docker Engine + Docker Compose v2 (plugin) on Ubuntu via the official Docker apt repositoryshell_basics_deployrole to install fish, modern CLI tools (btop, bmon, sysstat/iotop, tealdeer, eza), set shell/editor defaults, and keep chezmoi current via upstream installersnappymail_deployrole to install SnappyMail from upstream archives (PHP-FPM + nginx), wire IMAP/SMTP defaults, persist data under/mnt/cryptdata/snappymail, and expose via TraefikReadTheDocs integration with Sphinx and MyST parser
Browsable documentation at https://ops-library.readthedocs.io/
Furo theme for modern, clean appearance
Automated role documentation from individual READMEs
Just commands for documentation workflow (docs-build, docs-watch, etc.)
Documentation validation script (validate_docs.py)
Migrated to uv for Python dependency management
Faster dependency resolution and installation
Simplified justfile commands using
uv runRemoved manual venv activation requirements
homeassistant_deploy,homeassistant_backup, andhomeassistant_removeroles to cover the full lifecycle alongside the existing restore workflowhomeassistant_restorerole to validate archives, create safety snapshots, restore files, and roll back on failureFastDeploy backup & restore workflow:
fastdeploy_backuprole with metadata-rich snapshots, disk-space validation, and archive supportfastdeploy_restorerole with safety snapshots, permission fixes, health-check retries, and rollback automation
Paperless-ngx suite:
paperless_deploy,paperless_backup,paperless_restore,paperless_postgres, andpaperless_removeroles for deployment, disaster recovery, and safe removalredis_installrole to provision standalone Redis instances with optional authentication, persistence, and memory tuningpostgres_installrole to install PostgreSQL with manageable config, databases, users, and extensionsminio_deployrole to provision MinIO with dual-router Traefik exposure, security hardening, and optional client bootstrappingminio_removerole to destructively remove MinIO with confirmation, optional data preservation, and Traefik cleanupDynamic DNS support in
dns_deploy, adding an opt-in LiveDNS updater with dedicated service accounts, timers, and IPv4/IPv6 supportUniFi lifecycle roles:
unifi_deploy,unifi_backup,unifi_restore, andunifi_remove(Mongo-auth aware, Traefik/HA integration, Justfile wiring, docs)Navidrome lifecycle roles:
navidrome_deploy,navidrome_backup,navidrome_restore, andnavidrome_remove(systemd binary install, Traefik basic auth, rescan timer, backup/restore tooling)
Changed¶
voxhelm_remote_worker_deploynow defaults tocaffeinate -imsso macOS remote workers stay awake during long jobs while allowing display sleep.tailscale_metrics_endpointnow defaults node-key expiry alerts to warning inside 3 days and critical inside 1 day.zedrole scrub timers can optionally wait for completion and run a post-scrub spindown hookUnit tests for OpenClaw metrics collector canary behavior and schema invariants (
tests/unit/test_openclaw_metrics_collector.py)
Fixed¶
openclaw_deploysynthetic canaries now use fresh per-attempt session ids derived from the configured canary prefix and clean up generated canary session files after a bounded retention window, preventing reused canary history from causing context overflow, malformed markers, and retry lock contention.openclaw_deploymetrics collector now treats parseable nonzerohealth --jsonoutput as collected health data, so transient Telegram probe failures do not setcollector_ok=false.nyxmon_storage_exporternow parses in-progress and paused ZFS scrub timestamps without confusing the weekdayMonfor a completed-scrubonmarker, avoiding false scrub-age warnings while a pool is actively scrubbing.Deploy roles now build stat assertion labels and error messages from the original loop item instead of registered result invocation metadata, restoring compatibility with newer ansible-core controllers.
Collection metadata now declares the documented ansible-core 2.20+ runtime requirement.
wagtail_deployrsync deployments now exclude the managed.envfile and collected/staticfilesdirectory, preventing failed deploys from clobbering runtime secrets or deleting WhiteNoise assets beforecollectstaticruns.
Changed¶
os_apt_maintenanceendpoint responses now derive$.reboot_requiredfrom the live/var/run/reboot-requiredmarker so monitoring clears immediately after a successful reboot.mastodon_backupnow excludes Mastodon’s refetchablepublic/system/cachesubtree from local media backups by default and records the media exclude list in backup manifests.mastodon_backupnow runspg_dumpas the backup owner by default so password-authenticated dumps can write into root-owned backup directories.openclaw_deploynow uses a shallow single-tag/branch source checkout so upstream branch namespace conflicts do not block tag-pinned deployments.openclaw_deploynow renders the managed slash-skill session manifest without invalid inline Jinja comments.openclaw_deploynow normalizes legacy Telegram streaming aliases in persisted gateway configs before restarting newer OpenClaw releases.openclaw_deploymetrics collector now recognizes the current OpenClaw Telegram health shape (running/connected) when derivingtelegram_probe_ok.openclaw_deploydocumentation now uses upstream stablev2026.5.7in examples and validation hints.paperless_deploynow defaults to Paperless-ngx 2.20.15 and supports checksum verification for known upstream release archives.paperless_deploynow restarts Paperless services before health checks when a release symlink or package install changes, preventing upgraded deployments from leaving old worker processes serving the previous release.homeassistant_deploynow supports Home Assistant 2026.5 on Python 3.14, installs host-specific integration requirements before startup, removes legacy MET weather YAML when requested, and isolates the Matter Server in its own virtualenv to avoid Matter package namespace collisions.unifi_deploynow reconciles the Home Assistant UniFi admin when it already exists, including password hash drift and missing readonly site privileges.dns_deploynow supports Unbound cache prefetch, stale-TTL reset, optional RFC 8767 timeout tuning, recursion queue sizing, and disables Ubuntu’s legacy resolvconf helper when the role manages/etc/resolv.confdns_deployblocklist refreshes now tolerate individual download failures, understand both hosts-style and AdGuard-style lists, and document the limits ofserve-expiredduring WAN reconnectsnetplan_confignow rejects interfaces that combinedhcp4: truewith a manual IPv4 default route, documents DHCP-backed hosts to use DHCP-managed default routes, and offers an optional post-applynetworkctl reconfigurerecovery pass fornetworkdhosts stuck in a failed link stateClosed out the refactor documentation pass so top-level docs and role READMEs describe the landed deploy/restore helper boundaries as complete work and frame remaining items as normal follow-up maintenance instead of pending refactor waves
dns_deploynow exposes optional Unboundserve-expiredcontrols and documentsforward_firstguidance for the root zone so resolver failover behavior is explicitnyxmon_restorenow mirrors the Home Assistant structure (validate/prepare/restore/verify/cleanup), keeps cleanup in a top-level block/always flow, adds restore-phase block/rescue rollback, conditional restores, handler flush, and health checksnyxmon_deploysystemd service now launches Granian instead of Gunicorn to match the upstream projectollama_installstops any Homebrew-managed Ollama service by default, stops conflicting user-levelollama serveprocesses, and ensures the launchd service is runningUpdated README.md with prominent link to ReadTheDocs
Updated repository URLs to https://github.com/ephes/ops-library
Modernized Python tooling: uv replaces traditional pip/venv workflow
Removed
docs-setupcommand (auto-handled by uv)fastdeploy_deploynow depends onpostgres_installfor database provisioning (removing the legacy inline PostgreSQL tasks)uv_installdetects alternate uv installations, relinks to newer binaries automatically, and enablesuv_update_existingby default to keep hosts currentfastdeploy_deployimplements Traefik’s dual-router pattern with IP-based allow lists, bcrypt-hashed basic auth, security headers, and compression middlewarePaperless roles now support Python 3.14 and include an optional ocrmypdf patch to keep OCR workflows unblocked
paperless_deployno longer installsdefault-libmysqlclient-dev, avoiding apt conflicts with MariaDB development packages on Ubuntu 24.04 when using the PostgreSQL backendredis_installenables config validation by default to catch syntax and runtime issues before service restartsnyxmon_deployandhomelab_deployswitch from Granian to Gunicorn and gained configurable Python version management (defaulting to 3.13)nyxmon_deploynow enforces the same dual-router authentication policy as other public services, including validation and hashed credentialsnyxmon_deploynow flushes handlers and smoke-validates the live monitoring worker’s OpsGate submit and approval URLs so stale approval-link wiring fails during deployDNS deployment/removal flows hardened with improved resolver management, legacy
unbound_onlyport detection, and safer variable validationsnappymail_deploynow writes managed domain configs as.json, removes conflicting legacy.inifiles, and supportssnappymail_remove_domainscleanup for stale domain overridesopen_webui_deploydocumentation now calls out thestudio.tailde2ec.ts.nethostname, Traefik config path/basic auth wiring, and ops-control preflight bypass flagopen_webui_removenow defaults to non-destructive options and supports removing compose/env files separately from the site directoryzfs_usb_replicationgained optional syncoid identifiers, force-export, and spindown hooks to prevent snapshot collisions and park disks after USB runsopenclaw_deploysynthetic canary collection now sets explicit collectorTimeoutStartSec=600, keeps dedicated canary session-id routing, and preserves stable canary metadata keys (agent,timeout_seconds,session_id) in payload defaults
Fixed¶
backup_metrics_endpointandopenclaw_deploycollector timers now schedule from timer activation and collector completion, preventing post-reboot or post-restartactive (elapsed)timers with no next run.mail_spam_deploynow configures the Rspamd APT repository with a scopedsigned-bykeyring and removes the legacy global apt-key entry, avoiding apt-key deprecation warnings on Ubuntu 24.04.mastodon_backupnow restarts Mastodon services after failed backup payload capture, preventingpg_dumpor media-copy failures from leaving services stopped.mastodon_restorenow makes the staged database dump path traversable by the restore OS user before runningpg_restore, while keeping the default peer-auth restore user.wagtail_deploynow protects the top-level/cachedirectory from rsync deletion and recreateswagtail_cache_dirafter source deployment, preventing Django file-based cache failures like the python-podcast feed incidentmastodon_deploynow resolves the concrete Node version path fromnvm versioninstead of guessing annvmdirectory name from.nvmrc, fixing deploys where values like24.10install underv24.10.0and otherwise breakyarnduring asset precompilemastodon_deploynow clears Rails cache after source, runtime, dependency, migration, or asset-build changes so stale cached instance metadata does not survive Mastodon upgrades in Redis after the services restartmastodon_deploynow restarts the web, Sidekiq, and streaming services when source, runtime, dependency, migration, or asset-build tasks change, so upgrades and recovery reruns do not leave long-running processes serving the previous release until a manual restartlogyard_vector_deploynow disables the Vector Loki sink startup health check by default and validates staged config with--skip-healthchecks, preventing transient Logyard/Loki 5xx responses from blocking Vector service startup after package upgrades or restarts.dns_deploynow points its default AdGuard DNS filter source at the maintained upstream URL, avoiding daily blocklist refresh failures from the retired GitHub raw pathsanoidnow renders datasetuse_templatevalues using the bare template name expected by Sanoid instead of the literal section header, restoring per-dataset retention and pruning behavior for roles like Fractal Time Machine backupsHome Assistant presence automations now include the default file to prevent missing automation imports after deployment
dns_removecleans up DDNS units reliably and no longer crashes on undefined variables during selective removalunifi_restorenow re-imports MongoDB dumps, honors host/port overrides, and ships with sane defaults so UniFi logins and controller state survive a remove/deploy/restore cycleunifi_deploygracefully skips the Home Assistant integration on the very first bootstrap when the UniFi “default” site does not exist yet, avoiding infinite waits on greenfield installsopen_webui_deploynow validates the bind host and host port range to catch invalid settings earlierzfs_usb_replicationnow creates/etc/exports.dbefore mount and auto-setscanmount=offon existing recursive+readonly targets to avoid mountpoint creation failures on subsequent runs
2.0.0 - 2025-10-09¶
Breaking Changes¶
REMOVED:
python_app_systemdrole - Legacy manifest-driven deployment (use dedicated*_deployroles instead)REMOVED:
python_app_djangorole - Legacy manifest-driven Django deployment (use dedicated*_deployroles instead)
Added¶
homelab_deployrole - Django/Granian deployment with dual router Traefik authenticationhomelab_removerole - Safe removal with data preservation optionstraefik_deployrole - Install and harden Traefik with Let’s Encrypt automation, architecture auto-detection, and smoke teststraefik_removerole - Safe Traefik uninstallation with confirmation gates and preservation togglesdns_deployanddns_removeroles - Manage Pi-hole/Unbound (later Unbound-only) DNS stacks with split-DNS views and clean removalDual router authentication pattern for Traefik (internal: no auth, external: basic auth)
Comprehensive Traefik security documentation
Broken venv detection and auto-removal in Python deployment tasks
Build ignore patterns in galaxy.yml for faster collection builds
Comprehensive documentation structure with README.md and ARCHITECTURE.md
CLAUDE.md for AI assistant context
Standardized role README template
Changed¶
Streamlined role documentation for consistency
Fixed systemd service template to remove
ProtectHomefor services in /homeImproved validation.yml to handle undefined variables gracefully in homelab_remove
Removed legacy role documentation pages
Updated role index to reflect removal
Added migration guidance for users of removed roles
Updated uv_install examples to use modern deployment pattern
nyxmon_deploygained rsync support for additional source directories and smarter uv-based dependency management (pyproject validation, lock cleanup, mode-aware sync commands)
Fixed¶
Template evaluation crashes in homelab_remove when home directory doesn’t exist
Undefined variable errors in removal validation when database/media checks are skipped
Permission issues with Python virtual environments on redeployment
Migration Guide¶
If you were using python_app_systemd or python_app_django:
Migrate to dedicated roles:
fastdeploy_deploy,nyxmon_deploy,homelab_deploy, etc.Follow the role development guide to create custom deployment roles if needed
The old
services.d/manifest workflow is no longer supported
1.0.0 - 2024-09-22¶
Added¶
Initial release of ops-library collection
Core service deployment roles:
fastdeploy_deploy- Deploy FastDeploy platformnyxmon_deploy- Deploy Nyxmon monitoring servicefastdeploy_remove- Remove FastDeploy servicenyxmon_remove- Remove Nyxmon service
Service registration roles:
apt_upgrade_register- Register apt upgrade tasks with FastDeployfastdeploy_register_service- Generic service registration helperfastdeploy_self_deploy- FastDeploy self-deployment registration
Bootstrap roles:
ansible_install- Install Ansible and dependenciesuv_install- Install uv for Python environment managementsops_dependencies- Install SOPS/age prerequisites
Testing infrastructure:
test_dummy- Example service for testing deployment patterns
Legacy compatibility roles:
python_app_django- Django application deployment (deprecated)python_app_systemd- Systemd service management (deprecated)
Security¶
Strict validation of secrets to prevent “CHANGEME” placeholder values
SOPS/age encryption support for secrets management
Sudoers configuration for privilege separation
Role Version History¶
fastdeploy_deploy¶
1.0.0 (2024-09-22): Initial release with rsync/git deployment support
nyxmon_deploy¶
1.0.0 (2024-09-22): Initial release with Telegram integration
apt_upgrade_register¶
1.0.0 (2024-09-22): Initial release with SSH key management