diff --git a/README.md b/README.md new file mode 100644 index 0000000..50c1c6f --- /dev/null +++ b/README.md @@ -0,0 +1,236 @@ +# mailcow + Nginx Proxy Manager Certificate Sync + +Documentation for the self-hosted mailcow mail server and the automated +certificate pipeline that feeds it Let's Encrypt certs issued by +Nginx Proxy Manager (NPM). + +--- + +## Table of Contents + +- [Overview](#overview) +- [Why this setup](#why-this-setup) +- [Hosts](#hosts) +- [Architecture](#architecture) +- [Components](#components) + - [Push script (NPM host)](#1-push-script-npm-host) + - [SSH transport](#2-ssh-transport) + - [Deploy script (mailcow host)](#3-deploy-script-mailcow-host) + - [mailcow configuration](#4-mailcow-configuration) + - [Scheduling](#5-scheduling-systemd-timers) +- [Verification](#verification) +- [Gotchas](#gotchas) +- [Monitoring](#monitoring) +- [Migration / rebuild notes](#migration--rebuild-notes) + +--- + +## Overview + +NPM is the single source of truth for the Let's Encrypt certificate covering +`mail.wittenberger.us`. mailcow consumes that certificate for **all** mail +protocols (SMTP, IMAP, POP3, ManageSieve) plus its web UI, instead of running +its own ACME client. + +Because NPM and mailcow run on **separate hosts**, the certificate is +distributed via a two-host **push → deploy** chain over SSH, each side driven by +its own `systemd` timer. + +## Why this setup + +- Centralizes all Let's Encrypt issuance/renewal in NPM (single place to manage + and audit certs). +- mailcow's internal ACME is disabled (`SKIP_LETS_ENCRYPT=y`), so there is no + second ACME client competing for the same hostname. +- The mail protocols and the web UI all present the same valid cert. + +## Hosts + +| Role | Host | Address | +| ------------------- | ------------- | -------------- | +| Cert source (NPM) | NGX-Homepage | — | +| mailcow (consumer) | mailcow | 10.10.14.229 | + +## Architecture + +``` +[NGX-Homepage] [mailcow host] + NPM npm-5 cert /home/certsync/incoming/ (staging) + | | + | push-mailcow-cert.sh | deploy-staged-cert.sh + | (rsync -azL over SSH) ───────────────────►| validate → copy → reload + | | + └─ systemd: mailcow-cert-push.timer └─ systemd: mailcow-cert-deploy.timer + 03:00 / 15:00 03:15 / 15:15 +``` + +The deploy timer runs ~15 minutes after the push so the file is staged before +deployment. + +## Components + +### 1. Push script (NPM host) + +**Path:** `/root/push-mailcow-cert.sh` on **NGX-Homepage** (runs as root) + +- Source cert: `/etc/nginx/letsencrypt/live/npm-5/` + - NPM names its cert directories by internal ID (`npm-N`), not by hostname. + Identify the correct one by matching subject/SAN: + ```bash + for d in /etc/nginx/letsencrypt/live/npm-*; do + echo "=== $d ===" + openssl x509 -noout -subject -ext subjectAltName -in "$d/cert.pem" 2>/dev/null + done + ``` +- Compares the source cert's SHA-256 fingerprint against a local state file + (`/var/lib/mailcow-cert-push/last_fp`) and **pushes only when it changes**. +- Transfers `fullchain.pem` and `privkey.pem` with `rsync -azL` to the mailcow + staging dir. + +> **`-L` is required.** Let's Encrypt's `live/` directory contains symlinks into +> `archive/`. Without `-L` (`--copy-links`), rsync copies the symlinks, which +> dangle on the destination. `-L` follows them and copies the real files. + +### 2. SSH transport + +- Dedicated user **`certsync`** on the mailcow host; staging dir + `/home/certsync/incoming`. +- Dedicated ed25519 key **`mailcow_certsync`** (private key on NPM host, public + key in `certsync`'s `authorized_keys`). +- The `authorized_keys` entry is **restricted** with a forced command and + `restrict` so the key can only perform the rsync receive — no shell: + ``` + command="rsync --server -logDtpre.iLsfxCIvu . /home/certsync/incoming/",restrict ssh-ed25519 AAAA... mailcow-cert-push + ``` +- `rsync` must be installed on **both** hosts. + +### 3. Deploy script (mailcow host) + +**Path:** `/opt/mailcow-dockerized/deploy-staged-cert.sh` on **mailcow** (runs as root) + +- **Cert/key match check (algorithm-agnostic).** Compares the public key derived + from the cert against the one derived from the key: + ```bash + openssl x509 -in fullchain.pem -noout -pubkey | openssl md5 + openssl pkey -in privkey.pem -pubout | openssl md5 + ``` + This works for RSA, ECDSA, and Ed25519. (A `-modulus` based check is RSA-only + and fails on the ECDSA / EC-384 cert used here.) +- Confirms the cert covers `mail.wittenberger.us`. +- SHA-256 change-detection — deploys and reloads **only on change**. +- Installs into `data/assets/ssl/cert.pem` (`0644`) and `key.pem` (`0600`). +- Reloads `postfix-mailcow`, `dovecot-mailcow`, `nginx-mailcow`. + +### 4. mailcow configuration + +In `/opt/mailcow-dockerized/mailcow.conf`: + +| Setting | Value | Meaning | +| ---------------------- | ----- | ----------------------------------------- | +| `SKIP_LETS_ENCRYPT` | `y` | mailcow's internal ACME client disabled. | +| `ENABLE_SSL_SNI` | `y` | Per-domain SNI certs (see gotcha below). | + +mailcow's "bring your own certificate" mode reads +`data/assets/ssl/cert.pem` and `data/assets/ssl/key.pem`. **Do not symlink** — +the files must be real copies. + +### 5. Scheduling (systemd timers) + +| Host | Units | Schedule | +| ------------- | -------------------------------------- | --------------- | +| NGX-Homepage | `mailcow-cert-push.{service,timer}` | 03:00 / 15:00 | +| mailcow | `mailcow-cert-deploy.{service,timer}` | 03:15 / 15:15 | + +Both timers use `Persistent=true` so a host that was powered off catches up on +next boot. + +**Push timer** (`/etc/systemd/system/mailcow-cert-push.timer`): +```ini +[Timer] +OnCalendar=*-*-* 03,15:00:00 +Persistent=true +RandomizedDelaySec=300 +``` + +**Deploy timer** (`/etc/systemd/system/mailcow-cert-deploy.timer`): +```ini +[Timer] +OnCalendar=*-*-* 03,15:15:00 +Persistent=true +RandomizedDelaySec=180 +``` + +## Verification + +```bash +# Timers: confirm active, last/next run +systemctl list-timers '*cert*' --no-pager + +# Served cert ON THE WIRE — the real source of truth (not the file on disk) +openssl s_client -connect mail.wittenberger.us:993 -servername mail.wittenberger.us \ + /dev/null | openssl x509 -noout -fingerprint -sha256 -enddate -subject + +# Deployed file on the mailcow host +openssl x509 -noout -fingerprint -sha256 -enddate \ + -in /opt/mailcow-dockerized/data/assets/ssl/cert.pem + +# Source cert on NGX-Homepage +openssl x509 -noout -fingerprint -sha256 -enddate \ + -in /etc/nginx/letsencrypt/live/npm-5/cert.pem +``` + +When healthy, all three SHA-256 fingerprints match. + +Manual dry run (tests the exact path the timers use): +```bash +# NGX-Homepage +sudo systemctl start mailcow-cert-push.service +journalctl -u mailcow-cert-push.service --no-pager -n 20 + +# mailcow +sudo systemctl start mailcow-cert-deploy.service +journalctl -u mailcow-cert-deploy.service --no-pager -n 20 +``` +With an unchanged cert these report "nothing to do" — which confirms +change-detection is working. + +## Gotchas + +1. **Symlinks** — use `rsync -azL`; without `-L` the cert lands as a dangling + symlink and the deploy reports "no staged cert." +2. **ECDSA vs RSA** — validate with public-key comparison, not `-modulus` + (modulus is RSA-only; the cert here is EC-384). +3. **SNI subdir** — with `ENABLE_SSL_SNI=y`, mailcow may serve a per-domain cert + from `data/assets/ssl/mail.wittenberger.us/` ahead of the top-level + `cert.pem`. Always verify the served cert with `openssl s_client`, not just + the file on disk. +4. **Reload vs restart** — container `reload` picks up the new cert on the + current version. If a future version doesn't, restart instead: + ```bash + docker compose restart postfix-mailcow dovecot-mailcow nginx-mailcow + ``` +5. **Egress firewall** — the mailcow host runs default-deny outbound; outbound + SSH (port 22) to the relevant host must be explicitly allowed. + +## Monitoring + +The chain fails **silently** — a script erroring on a timer only logs to +journald. Recommended safeguards: + +- A Wazuh rule watching `mailcow-cert-deploy.service` for non-zero exit / `ERROR`. +- A periodic check that the served cert on `:993` is not within *N* days of + expiry, alerting if a renewal hasn't propagated. + +## Migration / rebuild notes + +The cert-sync **host-level** components are part of host provisioning, **not** +mailcow data: + +- `certsync` user + restricted SSH key +- `deploy-staged-cert.sh` +- both `systemd` units + +These are **not** carried by mailcow backup/restore or cold-standby sync. If the +mailcow host is rebuilt or replaced (including an IP-reuse cutover to a new VM), +recreate these on the new host and confirm both timers are active before relying +on automated renewal.