Files
mailcow/README.md
T
2026-05-28 14:21:02 +00:00

237 lines
8.7 KiB
Markdown

# mailcow + Nginx Proxy Manager Certificate Sync
Documentation for the self-hosted mailcow mail server and the automated
certificate pipeline that feeds it Let's Encrypt certs issued by
Nginx Proxy Manager (NPM).
---
## Table of Contents
- [Overview](#overview)
- [Why this setup](#why-this-setup)
- [Hosts](#hosts)
- [Architecture](#architecture)
- [Components](#components)
- [Push script (NPM host)](#1-push-script-npm-host)
- [SSH transport](#2-ssh-transport)
- [Deploy script (mailcow host)](#3-deploy-script-mailcow-host)
- [mailcow configuration](#4-mailcow-configuration)
- [Scheduling](#5-scheduling-systemd-timers)
- [Verification](#verification)
- [Gotchas](#gotchas)
- [Monitoring](#monitoring)
- [Migration / rebuild notes](#migration--rebuild-notes)
---
## Overview
NPM is the single source of truth for the Let's Encrypt certificate covering
`mail.wittenberger.us`. mailcow consumes that certificate for **all** mail
protocols (SMTP, IMAP, POP3, ManageSieve) plus its web UI, instead of running
its own ACME client.
Because NPM and mailcow run on **separate hosts**, the certificate is
distributed via a two-host **push → deploy** chain over SSH, each side driven by
its own `systemd` timer.
## Why this setup
- Centralizes all Let's Encrypt issuance/renewal in NPM (single place to manage
and audit certs).
- mailcow's internal ACME is disabled (`SKIP_LETS_ENCRYPT=y`), so there is no
second ACME client competing for the same hostname.
- The mail protocols and the web UI all present the same valid cert.
## Hosts
| Role | Host | Address |
| ------------------- | ------------- | -------------- |
| Cert source (NPM) | NGX-Homepage | — |
| mailcow (consumer) | mailcow | 10.10.14.229 |
## Architecture
```
[NGX-Homepage] [mailcow host]
NPM npm-5 cert /home/certsync/incoming/ (staging)
| |
| push-mailcow-cert.sh | deploy-staged-cert.sh
| (rsync -azL over SSH) ───────────────────►| validate → copy → reload
| |
└─ systemd: mailcow-cert-push.timer └─ systemd: mailcow-cert-deploy.timer
03:00 / 15:00 03:15 / 15:15
```
The deploy timer runs ~15 minutes after the push so the file is staged before
deployment.
## Components
### 1. Push script (NPM host)
**Path:** `/root/push-mailcow-cert.sh` on **NGX-Homepage** (runs as root)
- Source cert: `/etc/nginx/letsencrypt/live/npm-5/`
- NPM names its cert directories by internal ID (`npm-N`), not by hostname.
Identify the correct one by matching subject/SAN:
```bash
for d in /etc/nginx/letsencrypt/live/npm-*; do
echo "=== $d ==="
openssl x509 -noout -subject -ext subjectAltName -in "$d/cert.pem" 2>/dev/null
done
```
- Compares the source cert's SHA-256 fingerprint against a local state file
(`/var/lib/mailcow-cert-push/last_fp`) and **pushes only when it changes**.
- Transfers `fullchain.pem` and `privkey.pem` with `rsync -azL` to the mailcow
staging dir.
> **`-L` is required.** Let's Encrypt's `live/` directory contains symlinks into
> `archive/`. Without `-L` (`--copy-links`), rsync copies the symlinks, which
> dangle on the destination. `-L` follows them and copies the real files.
### 2. SSH transport
- Dedicated user **`certsync`** on the mailcow host; staging dir
`/home/certsync/incoming`.
- Dedicated ed25519 key **`mailcow_certsync`** (private key on NPM host, public
key in `certsync`'s `authorized_keys`).
- The `authorized_keys` entry is **restricted** with a forced command and
`restrict` so the key can only perform the rsync receive — no shell:
```
command="rsync --server -logDtpre.iLsfxCIvu . /home/certsync/incoming/",restrict ssh-ed25519 AAAA... mailcow-cert-push
```
- `rsync` must be installed on **both** hosts.
### 3. Deploy script (mailcow host)
**Path:** `/opt/mailcow-dockerized/deploy-staged-cert.sh` on **mailcow** (runs as root)
- **Cert/key match check (algorithm-agnostic).** Compares the public key derived
from the cert against the one derived from the key:
```bash
openssl x509 -in fullchain.pem -noout -pubkey | openssl md5
openssl pkey -in privkey.pem -pubout | openssl md5
```
This works for RSA, ECDSA, and Ed25519. (A `-modulus` based check is RSA-only
and fails on the ECDSA / EC-384 cert used here.)
- Confirms the cert covers `mail.wittenberger.us`.
- SHA-256 change-detection — deploys and reloads **only on change**.
- Installs into `data/assets/ssl/cert.pem` (`0644`) and `key.pem` (`0600`).
- Reloads `postfix-mailcow`, `dovecot-mailcow`, `nginx-mailcow`.
### 4. mailcow configuration
In `/opt/mailcow-dockerized/mailcow.conf`:
| Setting | Value | Meaning |
| ---------------------- | ----- | ----------------------------------------- |
| `SKIP_LETS_ENCRYPT` | `y` | mailcow's internal ACME client disabled. |
| `ENABLE_SSL_SNI` | `y` | Per-domain SNI certs (see gotcha below). |
mailcow's "bring your own certificate" mode reads
`data/assets/ssl/cert.pem` and `data/assets/ssl/key.pem`. **Do not symlink** —
the files must be real copies.
### 5. Scheduling (systemd timers)
| Host | Units | Schedule |
| ------------- | -------------------------------------- | --------------- |
| NGX-Homepage | `mailcow-cert-push.{service,timer}` | 03:00 / 15:00 |
| mailcow | `mailcow-cert-deploy.{service,timer}` | 03:15 / 15:15 |
Both timers use `Persistent=true` so a host that was powered off catches up on
next boot.
**Push timer** (`/etc/systemd/system/mailcow-cert-push.timer`):
```ini
[Timer]
OnCalendar=*-*-* 03,15:00:00
Persistent=true
RandomizedDelaySec=300
```
**Deploy timer** (`/etc/systemd/system/mailcow-cert-deploy.timer`):
```ini
[Timer]
OnCalendar=*-*-* 03,15:15:00
Persistent=true
RandomizedDelaySec=180
```
## Verification
```bash
# Timers: confirm active, last/next run
systemctl list-timers '*cert*' --no-pager
# Served cert ON THE WIRE — the real source of truth (not the file on disk)
openssl s_client -connect mail.wittenberger.us:993 -servername mail.wittenberger.us \
</dev/null 2>/dev/null | openssl x509 -noout -fingerprint -sha256 -enddate -subject
# Deployed file on the mailcow host
openssl x509 -noout -fingerprint -sha256 -enddate \
-in /opt/mailcow-dockerized/data/assets/ssl/cert.pem
# Source cert on NGX-Homepage
openssl x509 -noout -fingerprint -sha256 -enddate \
-in /etc/nginx/letsencrypt/live/npm-5/cert.pem
```
When healthy, all three SHA-256 fingerprints match.
Manual dry run (tests the exact path the timers use):
```bash
# NGX-Homepage
sudo systemctl start mailcow-cert-push.service
journalctl -u mailcow-cert-push.service --no-pager -n 20
# mailcow
sudo systemctl start mailcow-cert-deploy.service
journalctl -u mailcow-cert-deploy.service --no-pager -n 20
```
With an unchanged cert these report "nothing to do" — which confirms
change-detection is working.
## Gotchas
1. **Symlinks** — use `rsync -azL`; without `-L` the cert lands as a dangling
symlink and the deploy reports "no staged cert."
2. **ECDSA vs RSA** — validate with public-key comparison, not `-modulus`
(modulus is RSA-only; the cert here is EC-384).
3. **SNI subdir** — with `ENABLE_SSL_SNI=y`, mailcow may serve a per-domain cert
from `data/assets/ssl/mail.wittenberger.us/` ahead of the top-level
`cert.pem`. Always verify the served cert with `openssl s_client`, not just
the file on disk.
4. **Reload vs restart** — container `reload` picks up the new cert on the
current version. If a future version doesn't, restart instead:
```bash
docker compose restart postfix-mailcow dovecot-mailcow nginx-mailcow
```
5. **Egress firewall** — the mailcow host runs default-deny outbound; outbound
SSH (port 22) to the relevant host must be explicitly allowed.
## Monitoring
The chain fails **silently** — a script erroring on a timer only logs to
journald. Recommended safeguards:
- A Wazuh rule watching `mailcow-cert-deploy.service` for non-zero exit / `ERROR`.
- A periodic check that the served cert on `:993` is not within *N* days of
expiry, alerting if a renewal hasn't propagated.
## Migration / rebuild notes
The cert-sync **host-level** components are part of host provisioning, **not**
mailcow data:
- `certsync` user + restricted SSH key
- `deploy-staged-cert.sh`
- both `systemd` units
These are **not** carried by mailcow backup/restore or cold-standby sync. If the
mailcow host is rebuilt or replaced (including an IP-reuse cutover to a new VM),
recreate these on the new host and confirm both timers are active before relying
on automated renewal.