Register env, database config, scripts, and systemd service/timer results,
compute matrix_synapse_s3_storage_provider_restart_necessary, and wire it
into group_vars/matrix_servers instead of hardcoding restart_necessary: true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register image pull, env, and systemd service results, compute
matrix_goofys_restart_necessary, and wire it into group_vars/matrix_servers
instead of hardcoding restart_necessary: true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded restart_necessary: true with the computed
backup_borg_restart_necessary variable that the role already exposes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded restart_necessary: true with the computed variables
(jitsi_web_restart_necessary, jitsi_prosody_restart_necessary,
jitsi_jicofo_restart_necessary, jitsi_jvb_restart_necessary) that the
jitsi role already exposes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded restart_necessary: true with computed values for:
conduit, continuwuity, dendrite, element-call, media-repo,
appservice-kakaotalk, and wechat.
Each role now registers results from config, support files, systemd service,
and docker image pull tasks, then computes a restart_necessary variable
from their combined .changed state. group_vars/matrix_servers is updated
to reference these variables instead of hardcoding true.
For dendrite, the systemd service template was also separated out of the
combined support-files with_items loop so it can be independently tracked.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These roles had conditional restart logic (restart_necessary set_fact) but
the docker_image build task result was not registered or included in the
condition, so a changed image build would not trigger a service restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MAS now connects to the playbook-managed Postgres via a UNIX socket by
default (when available), matching the approach already used by Synapse.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add support for all server_notices settings documented by Synapse:
- room_avatar_url: optional avatar for the server notices room
- room_topic: optional topic for the server notices room
- auto_join: whether users are auto-joined instead of invited (default: false)
Signed-off-by: Norman Ziegner <n.ziegner@hzdr.de>
syn2mas reads Synapse's homeserver.yaml and reuses the database
connection details from there.
When Synapse is configured to reach the integrated Postgres over a UNIX socket,
the temporary syn2mas container was given the config file but not the socket mount,
so migrations could fail even though Synapse itself was configured correctly.
Wire the Synapse socket settings into MAS via playbook vars and mount
the same socket path into the syn2mas container, so migrations work in
socket-based deployments without coupling the MAS role directly to
Synapse role variables.
The client_reader route bucket had collapsed into one long alternation,
which made small worker-audit edits hard to review. Any endpoint change
rewrote the whole regex and obscured whether we were changing routing
policy or just maintaining the route list.
Refactor the variable into grouped regex entries with comments instead.
This keeps the current specialized-worker policy intact: nginx still
renders the client_reader locations in the same block, and the routes
still target the same upstream bucket. The goal here is to make future
doc/code audits, additions, and removals mechanical and reviewable.
This also matches MDAD's current worker model, where generic workers are
not mixed with the specialized room/sync/client/federation reader
routing buckets, so there is no need to derive this from the generic
worker map.
Refs:
- https://github.com/element-hq/synapse/blob/b99a58719b274fcbb327fd8d7649185792bfd12c/docs/workers.md#historical-apps
- https://github.com/element-hq/synapse/blob/b99a58719b274fcbb327fd8d7649185792bfd12c/docs/workers.md#synapseappgeneric_worker
Some client API endpoints (e.g. keys/upload) are backed by Synapse stream writers and
should not rely on broad worker regexes or route-order fallthrough for correctness.
When explicit per-stream routing is missing, requests may be captured by generic, room, or client_reader workers, instead of:
- going to the configured stream writer
- or to `main` when that stream writer is not enabled
This refactors synapse-reverse-proxy-companion's routing so that web-facing stream-backed endpoint families
are handled explicitly and early, with deterministic writer-or-main fallback.
Add first-class support for the missing `device_lists` stream writer,
generalize the same routing model to `push_rules`,
and remove stale broad-route ownership for device-list-sensitive endpoints.
Add matrix_synapse_ext_password_provider_ldap_tls_options_ca_certs_file
variable to allow specifying a custom CA certificate file for LDAP TLS
verification. Useful when Synapse is running in a container that does not
trust a private/internal CA by default.
Example usage:
matrix_synapse_ext_password_provider_ldap_tls_options_ca_certs_file: /etc/ssl/certs/my-ca.crt
When no identity server is configured, `matrix_client_element_default_is_url`
defaults to `~` (YAML null). The `| string | to_json` filter chain converts
this to the literal string `"None"`, causing Element Web to log errors:
- TypeError: URL constructor: None is not a valid URL
- Invalid base_url for m.identity_server
The well-known template (`.well-known/matrix/client.j2`) already handles
this correctly with a conditional guard (see PR #314). This applies the
same pattern to the Element Web `config.json.j2` template.
The companion role was tightly coupled to Synapse through shared tags, worker routing, and lifecycle ordering. Keeping them separate added coordination overhead without practical benefits, especially for parallelized execution.
This merges the role into matrix-synapse while keeping companion logic organized under dedicated reverse_proxy_companion task/template subdirectories.
Compatibility is preserved:
- matrix_synapse_reverse_proxy_companion_* variable names remain unchanged
- install/setup companion-specific tags remain available
Cross-role/global wiring is now in group_vars (matrix-synapse section), while role defaults provide sensible standalone defaults and self-wiring for Synapse-owned values.
Only queue matrix-goofys.service for restart when Synapse is enabled. Goofys is installed from the Synapse role, so non-Synapse homeserver configurations should not try to restart this unit. This mirrors the fix for issue https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
Only queue matrix-synapse-s3-storage-provider-migrate.timer for restart when Synapse is actually enabled. This prevents setup/install failures when a Synapse-only extension flag is set while using another homeserver implementation, as reported in https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/4959.
The startup issue came from a timing dependency around coturn TLS certs:
- `matrix-coturn.service` depends on
`matrix-traefik-certs-dumper-wait-for-domain@<matrix-fqdn>.service`
- That waiter succeeds only after Traefik has obtained and dumped a cert for
the Matrix hostname (typically driven by homeserver labels/routes becoming
active)
- If coturn is started too early, it can block/fail waiting for cert files
that are not yet present
Historically, coturn priority was mode-dependent:
- `one-by-one`: coturn at 1500 (delayed after homeserver)
- other modes: coturn at 900 (before homeserver)
This could still trigger undesirable startup ordering and confusing behavior
in non-`one-by-one` modes, especially during initial bootstrap/restart flows
where cert availability lags service startup.
This change makes ordering explicit and consistent:
1. Introduce `matrix_homeserver_systemd_service_manager_priority` (default 1000)
in `roles/custom/matrix-base/defaults/main.yml`.
2. Use that variable for the homeserver service entry in
`group_vars/matrix_servers`.
3. Set coturn priority relative to homeserver priority in all modes:
`matrix_homeserver_systemd_service_manager_priority + 500`.
4. Update inline documentation comments in `group_vars/matrix_servers` to
match the new behavior and rationale.
Result:
- Homeserver/coturn ordering is deterministic and mode-agnostic.
- Coturn is intentionally started later than the homeserver by default,
reducing first-start certificate wait/fail races.
- Priority intent is now centralized and configurable via a dedicated
homeserver priority variable.
- Coturn may still be stated earlier, because the homeserver typically
has a `Wants` "dependency" on it, but that's alright
Users reported that /.well-known/matrix/* stopped being served after the image bump to static-web-server v2.41.0.
Regression introduced by commit 32aeaca28b in PR #4951: https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/4951
Root cause: upstream changed hidden-file handling defaults, so paths under /.well-known were treated as hidden and no longer served by default.
Fix by explicitly configuring SERVER_IGNORE_HIDDEN_FILES=false in the matrix-static-files role and rendering it as a JSON boolean in the env template, making behavior stable across upstream default changes.
These three roles have multiple variable prefixes each:
- kakaotalk: matrix_appservice_kakaotalk + matrix_appservice_kakaotalk_node
- telegram: matrix_mautrix_telegram + matrix_mautrix_telegram_lottieconverter
- synapse: matrix_synapse + matrix_synapse_customized + matrix_synapse_rust_synapse_compress_state
For each: renamed _docker_image* to _container_image* (and _docker_src*,
_docker_repo* where applicable), added deprecation entries in
validate_config.yml, updated group_vars references, and moved
deprecation tasks to the front of validate_config.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous depName (forgejo.ellis.link/continuwuation/-/packages/container/continuwuity/)
was a Forgejo web UI path, not the Docker image name. Renovate's docker datasource
needs the image name as used in `docker pull`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add matrix_continuwuity_version with container_image_tag inheriting from it.
Rename all _docker_image* variables to _container_image* with deprecation notices.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add change-tracking and restart_necessary computation for:
- matrix-authentication-service (custom role in this repo)
- container-socket-proxy, traefik-certs-dumper, postgres, exim-relay,
cinny, livekit-server (external roles, bumped in requirements.yml)
Wire all 7 services in group_vars to use their _restart_necessary variable
instead of hardcoded true.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track config/image/systemd changes via register: directives and compute
a _restart_necessary variable for each service role, allowing the
systemd_service_manager to skip unnecessary restarts during install-* runs.
Covers 22 service roles: alertmanager-receiver, appservice-draupnir-for-all,
bridge-mautrix-wsproxy (+ syncproxy), cactus-comments, cactus-comments-client,
corporal, element-admin, ldap-registration-proxy, livekit-jwt-service, matrixto,
pantalaimon, prometheus-nginxlog-exporter, rageshake, registration, static-files,
sygnal, synapse-admin, synapse-auto-compressor, synapse-reverse-proxy-companion,
synapse-usage-exporter, and user-verification-service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For each of the 34 roles (3 clients, 9 bots, 22 bridges), this commit:
- Adds `_restart_necessary: false` default variable
- Adds `register:` directives to config/image/systemd tasks
- Computes `_restart_necessary` via set_fact (OR of all .changed results)
- Wires `(_restart_necessary | bool)` in group_vars/matrix_servers
This allows the systemd service manager to skip unnecessary restarts
when running install-* tags and nothing actually changed.
Service roles and complex multi-service roles will follow separately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These variables track whether a database migration necessitates a service
restart. The new name avoids confusion with the conditional restart
feature introduced in af193043/9accc848/4a8df138, where
devture_systemd_service_manager handles restarting services whose
configuration or image changed. The old _requires_restart name was
ambiguous — it could be mistaken for the systemd_service_manager
mechanism — so _migration_requires_restart makes the purpose explicit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Override devture_systemd_service_manager_conditional_restart_enabled in
group_vars based on ansible_run_tags: disabled when setup-* tags are used,
enabled otherwise. This replaces the --extra-vars hack in the justfile and
ensures consistent behavior for both `just` and raw `ansible-playbook` users.
- Revert justfile setup-all to its original form (no --extra-vars needed).
- Update docs/just.md to reflect tag-agnostic behavior.
- Add CHANGELOG.md entry documenting the conditional restart feature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Traefik's service list entry now uses the `traefik_restart_necessary`
variable (computed by the Traefik role) instead of hardcoded `true`,
so it is only restarted when its config, systemd unit, or image changed.
- `just setup-all` now passes
`devture_systemd_service_manager_conditional_restart_enabled=false`
to force unconditional restarts, matching its "full setup" semantics.
- Document the conditional restart behavior in docs/just.md.
Some benchmarks follow for `just install-service traefik -l matrix.example.com`
when Traefik settings did not change and a restart is not really necessary:
- Before:
- total time: 56 seconds 🐌
- Traefik restarted: yes ❌
- Services that depend on Traefik restarted: yes; all of them restarted ❌
- After:
- total time: 27 seconds ⚡
- Traefik restarted: no ✅
- Services that depend on Traefik restarted: no; none restarted ✅
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matrix_synapse_systemd_service_post_start_delay_seconds is assigned a string value, and setup fails while creating the service file. It is impossible to compare str and int.
After Synapse's systemd health check passes, Traefik still needs
providers.providersThrottleDuration to register routes. Derive the
post-start delay from this setting (+1s for healthcheck polling gap)
instead of using a hardcoded value. Defaults to 0 when no Traefik
reverse proxy is used.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch the systemd ExecStartPost health check from docker exec + curl
to polling docker inspect for container health status. This piggybacks
on the container image's built-in HEALTHCHECK instead of duplicating it.
Also add a configurable container health interval (5s for Traefik setups,
15s otherwise) to speed up startup readiness detection without affecting
non-Traefik deployments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When both services restart simultaneously (e.g. in all-at-once mode),
Traefik may momentarily truncate or reinitialize acme.json, causing
the certs dumper to read an empty file and panic. By adding
Requires/After on the Traefik service, the certs dumper only starts
after Traefik is fully ready and acme.json is stable.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, we had a 10-second magical delay.
Now we first do a healthcheck to figure out when it really is up.
Then, we do the same 10-second magical delay to account for the time it
may take for a reverse-proxy (like Traefik) to pick up Synapse's routes.
Addons typically access the homeserver via Traefik, but requests
ultimately lead to the homeserver and it'd better be up or Traefik would
serve a "404 Not Found" error.
This is an attempt (one of many pieces) to make services more reliable,
especially when `devture_systemd_service_manager_service_restart_mode: all-at-once` is used
(which is the default).
Commit 593b3157b ("Fix systemd service Wants for mjolnir and draupnir")
accidentally swapped the variable loops: `systemd_wanted_services_list`
ended up generating `Requires=`/`After=` directives and
`systemd_required_services_list` ended up generating `Wants=` directives —
the opposite of what the variable names mean and how every other
bot/bridge service template in the playbook works.
This caused these bots to only `Wants=` (not `Requires=`/`After=`) their
dependencies like matrix-traefik.service, so systemd didn't guarantee
ordering. During all-at-once restarts, the bots would start before traefik
was ready, fail with DNS resolution errors, and crash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migrate service now declares Requires/After on matrix-synapse.service,
ensuring Synapse (and its transitive dependencies like Postgres and Docker)
are running before the migration triggers.
When DB credentials change (derived from matrix_synapse_macaroon_secret_key),
a running Synapse container may fail to connect to its database and stop
serving requests. This causes register_new_matrix_user to fail with
"Connection refused" when the matrix-user-creator role tries to register users.
This extends the retry logic from 44b43a51b (which handled HMAC failures)
to also handle Connection refused errors: restart Synapse (picking up the
new config with updated credentials), wait for it to start, and retry.
Caused by c21a80d232
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These IDs were incorrectly auto-derived from matrix_homeserver_generic_secret_key,
which is meant for secrets that are OK to change. Datastore IDs are static
identifiers that must never change after first use.
The playbook now requires users to explicitly set matrix_media_repo_datastore_file_id
(and matrix_media_repo_datastore_s3_id when S3 is enabled) in vars.yml, with
validation that fails early if they are missing.
This was the last usage of passlib, which is now removed from prerequisites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the registration_shared_secret changes (derived from
matrix_synapse_macaroon_secret_key), a running Synapse container still
has the old secret in its config. This causes register_new_matrix_user
to fail with "HMAC incorrect" when the matrix-user-creator role tries
to register users.
This mirrors the approach from 2a581cce (which added similar retry
logic for the Matrix Authentication Service on database auth failure):
if the initial registration attempt fails with an HMAC error, restart
Synapse (picking up the new config with the updated secret), wait for
it to start, and retry.
Caused by c21a80d232
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the Postgres role updates database passwords (e.g., due to a
change in the secret derivation method), the Matrix Authentication
Service container may still be running with old configuration that
references the previous password. This causes mas-cli to fail with
"password authentication failed" when the matrix-user-creator role
tries to register users.
Rather than adding config-change detection or eager restarts to the
MAS role, this adds targeted retry logic: if the initial registration
attempt fails with a database authentication error, restart the MAS
service (which picks up the new config with the updated password),
wait for it to start, and retry. The restart usually only triggers
once per run since subsequent user registrations succeed after the restart.
Related to c21a80d232
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace password_hash('sha512', rounds=655555) with hash('sha512')
for all 114 secret derivations in group_vars/matrix_servers.
The old method (655k rounds of SHA-512) was designed for protecting
low-entropy human passwords in /etc/shadow. For deriving secrets
from a high-entropy secret key, a single hash round is equally
secure - the security comes from the key's entropy, not the
computational cost. SHA-512 remains preimage-resistant regardless
of rounds.
This yields a major performance improvement: evaluating
postgres_managed_databases (which references multiple derived
database passwords) dropped from ~10.7s to ~0.6s on a fast mini
PC. The Postgres role evaluates this variable multiple times, and
other roles reference derived passwords too, so the cumulative
savings across a full playbook run are substantial.
All derived service passwords (database passwords, appservice
tokens, etc.) will change on the next run. The main/superuser
database password is not affected (it's hardcoded in inventory
variables). All services receive their new passwords in the same
run, so this should be seamless.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fetch ansible-role-ddclient from MASH project
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Replace `matrix_dynamic_dns` with `ddclient`
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Set `matrix-dynamic-dns` to `ddclient_identifier`
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Remove `ddclient_container_network` in favor of the role's configuration
On the role the value of `ddclient_container_network` is set to `ddclient_identifier`, which is set to `matrix-dynamic-dns` on the playbook.
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Replace `matrix-dynamic-dns` with `ddclient` on matrix_servers
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Replace `ddclient_docker_image_*` with `ddclient_container_image_*`
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Update `ddclient_container_image_*`
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Move `ddclient_base_path` to matrix_servers
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Move `ddclient_web_*` to matrix_servers
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Remove `matrix-dynamic-dns` directory
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Update configuring-playbook-dynamic-dns.md
Reuse https://app.radicle.xyz/nodes/seed.radicle.garden/rad%3Az2SXkaceJw3YmS89T1xGysnFSjWsw/tree/75e264f53862ece4931d7970fea856242ff57034/docs/services/ddclient.md
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
* Fix a typo
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
---------
Signed-off-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
Co-authored-by: Suguru Hirahara <did:key:z6MkvVZk1A3KBApWJXv2Ju4H14ErDfRGxh8zxdXSZ4vACDg5>
The cli-non-interactive script passes arguments directly to psql, which
interprets positional arguments as database names, not SQL commands.
Without the -c flag, commands like:
/matrix/postgres/bin/cli-non-interactive 'DROP DATABASE foo;'
fail with: FATAL: database "DROP DATABASE foo;" does not exist
The correct syntax requires -c to pass a command:
/matrix/postgres/bin/cli-non-interactive -c 'DROP DATABASE foo;'
This mistake was originally introduced in c399992542
when the matrix-bridge-mautrix-hangouts role was removed. That commit's
uninstallation docs were then used as a template and the error propagated
to subsequent removal documentation for other bridges and components.