Files
2026-05-29 02:59:13 +00:00

431 lines
18 KiB
Markdown

# mailcow + Nginx Proxy Manager Certificate Sync
Documentation for the self-hosted mailcow mail server and the automated
certificate pipeline that feeds it Let's Encrypt certs issued by
Nginx Proxy Manager (NPM).
---
## Table of Contents
- [Overview](#overview)
- [Why this setup](#why-this-setup)
- [Hosts](#hosts)
- [Architecture](#architecture)
- [Components](#components)
- [Push script (NPM host)](#1-push-script-npm-host)
- [SSH transport](#2-ssh-transport)
- [Deploy script (mailcow host)](#3-deploy-script-mailcow-host)
- [mailcow configuration](#4-mailcow-configuration)
- [Scheduling](#5-scheduling-systemd-timers)
- [Verification](#verification)
- [Gotchas](#gotchas)
- [Outbound mail relay via SMTP2GO](#outbound-mail-relay-via-smtp2go)
- [DNS records](#dns-records)
- [Monitoring](#monitoring)
- [Migration / rebuild notes](#migration--rebuild-notes)
---
## Overview
NPM is the single source of truth for the Let's Encrypt certificate covering
`mail.wittenberger.us`. mailcow consumes that certificate for **all** mail
protocols (SMTP, IMAP, POP3, ManageSieve) plus its web UI, instead of running
its own ACME client.
Because NPM and mailcow run on **separate hosts**, the certificate is
distributed via a two-host **push → deploy** chain over SSH, each side driven by
its own `systemd` timer.
## Why this setup
- Centralizes all Let's Encrypt issuance/renewal in NPM (single place to manage
and audit certs).
- mailcow's internal ACME is disabled (`SKIP_LETS_ENCRYPT=y`), so there is no
second ACME client competing for the same hostname.
- The mail protocols and the web UI all present the same valid cert.
## Hosts
| Role | Host | Address |
| ------------------- | ------------- | -------------- |
| Cert source (NPM) | NGX-Homepage | - |
| mailcow (consumer) | mailcow | 10.10.14.229 |
## Architecture
```
[NGX-Homepage] [mailcow host]
NPM npm-5 cert /home/certsync/incoming/ (staging)
| |
| push-mailcow-cert.sh | deploy-staged-cert.sh
| (rsync -azL over SSH) ───────────────────►| validate → copy → reload
| |
└─ systemd: mailcow-cert-push.timer └─ systemd: mailcow-cert-deploy.timer
03:00 / 15:00 03:15 / 15:15
```
The deploy timer runs ~15 minutes after the push so the file is staged before
deployment.
## Components
### 1. Push script (NPM host)
**Path:** `/root/push-mailcow-cert.sh` on **NGX-Homepage** (runs as root)
- Source cert: `/etc/nginx/letsencrypt/live/npm-5/`
- NPM names its cert directories by internal ID (`npm-N`), not by hostname.
Identify the correct one by matching subject/SAN:
```bash
for d in /etc/nginx/letsencrypt/live/npm-*; do
echo "=== $d ==="
openssl x509 -noout -subject -ext subjectAltName -in "$d/cert.pem" 2>/dev/null
done
```
- Compares the source cert's SHA-256 fingerprint against a local state file
(`/var/lib/mailcow-cert-push/last_fp`) and **pushes only when it changes**.
- Transfers `fullchain.pem` and `privkey.pem` with `rsync -azL` to the mailcow
staging dir.
> **`-L` is required.** Let's Encrypt's `live/` directory contains symlinks into
> `archive/`. Without `-L` (`--copy-links`), rsync copies the symlinks, which
> dangle on the destination. `-L` follows them and copies the real files.
### 2. SSH transport
- Dedicated user **`certsync`** on the mailcow host; staging dir
`/home/certsync/incoming`.
- Dedicated ed25519 key **`mailcow_certsync`** (private key on NPM host, public
key in `certsync`'s `authorized_keys`).
- The `authorized_keys` entry is **restricted** with a forced command and
`restrict` so the key can only perform the rsync receive - no shell:
```
command="rsync --server -logDtpre.iLsfxCIvu . /home/certsync/incoming/",restrict ssh-ed25519 AAAA... mailcow-cert-push
```
- `rsync` must be installed on **both** hosts.
### 3. Deploy script (mailcow host)
**Path:** `/opt/mailcow-dockerized/deploy-staged-cert.sh` on **mailcow** (runs as root)
- **Cert/key match check (algorithm-agnostic).** Compares the public key derived
from the cert against the one derived from the key:
```bash
openssl x509 -in fullchain.pem -noout -pubkey | openssl md5
openssl pkey -in privkey.pem -pubout | openssl md5
```
This works for RSA, ECDSA, and Ed25519. (A `-modulus` based check is RSA-only
and fails on the ECDSA / EC-384 cert used here.)
- Confirms the cert covers `mail.wittenberger.us`.
- SHA-256 change-detection - deploys and reloads **only on change**.
- Installs into `data/assets/ssl/cert.pem` (`0644`) and `key.pem` (`0600`).
- Reloads `postfix-mailcow`, `dovecot-mailcow`, `nginx-mailcow`.
### 4. mailcow configuration
In `/opt/mailcow-dockerized/mailcow.conf`:
| Setting | Value | Meaning |
| ---------------------- | ----- | ----------------------------------------- |
| `SKIP_LETS_ENCRYPT` | `y` | mailcow's internal ACME client disabled. |
| `ENABLE_SSL_SNI` | `y` | Per-domain SNI certs (see gotcha below). |
mailcow's "bring your own certificate" mode reads
`data/assets/ssl/cert.pem` and `data/assets/ssl/key.pem`. **Do not symlink** -
the files must be real copies.
### 5. Scheduling (systemd timers)
| Host | Units | Schedule |
| ------------- | -------------------------------------- | --------------- |
| NGX-Homepage | `mailcow-cert-push.{service,timer}` | 03:00 / 15:00 |
| mailcow | `mailcow-cert-deploy.{service,timer}` | 03:15 / 15:15 |
Both timers use `Persistent=true` so a host that was powered off catches up on
next boot.
**Push timer** (`/etc/systemd/system/mailcow-cert-push.timer`):
```ini
[Timer]
OnCalendar=*-*-* 03,15:00:00
Persistent=true
RandomizedDelaySec=300
```
**Deploy timer** (`/etc/systemd/system/mailcow-cert-deploy.timer`):
```ini
[Timer]
OnCalendar=*-*-* 03,15:15:00
Persistent=true
RandomizedDelaySec=180
```
## Verification
```bash
# Timers: confirm active, last/next run
systemctl list-timers '*cert*' --no-pager
# Served cert ON THE WIRE - the real source of truth (not the file on disk)
openssl s_client -connect mail.wittenberger.us:993 -servername mail.wittenberger.us \
</dev/null 2>/dev/null | openssl x509 -noout -fingerprint -sha256 -enddate -subject
# Deployed file on the mailcow host
openssl x509 -noout -fingerprint -sha256 -enddate \
-in /opt/mailcow-dockerized/data/assets/ssl/cert.pem
# Source cert on NGX-Homepage
openssl x509 -noout -fingerprint -sha256 -enddate \
-in /etc/nginx/letsencrypt/live/npm-5/cert.pem
```
When healthy, all three SHA-256 fingerprints match.
Manual dry run (tests the exact path the timers use):
```bash
# NGX-Homepage
sudo systemctl start mailcow-cert-push.service
journalctl -u mailcow-cert-push.service --no-pager -n 20
# mailcow
sudo systemctl start mailcow-cert-deploy.service
journalctl -u mailcow-cert-deploy.service --no-pager -n 20
```
With an unchanged cert these report "nothing to do" - which confirms
change-detection is working.
## Gotchas
1. **Symlinks** - use `rsync -azL`; without `-L` the cert lands as a dangling
symlink and the deploy reports "no staged cert."
2. **ECDSA vs RSA** - validate with public-key comparison, not `-modulus`
(modulus is RSA-only; the cert here is EC-384).
3. **SNI subdir** - with `ENABLE_SSL_SNI=y`, mailcow may serve a per-domain cert
from `data/assets/ssl/mail.wittenberger.us/` ahead of the top-level
`cert.pem`. Always verify the served cert with `openssl s_client`, not just
the file on disk.
4. **Reload vs restart** - container `reload` picks up the new cert on the
current version. If a future version doesn't, restart instead:
```bash
docker compose restart postfix-mailcow dovecot-mailcow nginx-mailcow
```
5. **Egress firewall** - the mailcow host runs default-deny outbound; outbound
SSH (port 22) to the relevant host must be explicitly allowed.
## Outbound mail relay via SMTP2GO
This mailcow instance runs on residential internet, where outbound port 25 is
typically blocked by the ISP and the IP has no PTR/rDNS control - both of which
make direct MTA-to-MTA delivery unreliable. Outbound mail is therefore relayed
through **SMTP2GO**, which provides a reputable sending IP with proper SPF, DKIM,
and reverse DNS.
### Relay parameters
| Setting | Value |
| ------------- | ------------------------------------------- |
| Smarthost | `mail.smtp2go.com` |
| Port (in use) | **2525** (STARTTLS) |
| Alt ports | 25, 80, 587, 8025 (STARTTLS) / 465, 8465, 443 (implicit TLS) |
| Auth | Username + password (created in SMTP2GO UI) |
| Encryption | TLS (STARTTLS on 2525) |
Credentials are managed in the SMTP2GO dashboard under **SMTP Users**.
### Configuration in mailcow
Configure the relay through the mailcow admin UI (do **not** hand-edit Postfix
config inside the container - mailcow regenerates it):
1. Log in to the mailcow web UI as admin.
2. Navigate to **Configuration → Routing → Sender-Dependent Transports**
(or **Configuration → Configuration & Details → Routing → Transports**,
depending on the mailcow version).
3. Add a transport:
- **Host**: `mail.smtp2go.com`
- **Port**: `2525`
- **Username**: the SMTP2GO SMTP user (from the SMTP2GO dashboard)
- **Password**: the SMTP2GO SMTP user's password
- **TLS**: enabled (STARTTLS)
4. Either set the transport as the **default** for outbound mail, or assign it
to specific sender domains via **Sender-Dependent Transports**.
mailcow stores these settings in its database and regenerates Postfix's
`main.cf` / `transport` / `sasl_passwd` automatically - no manual Postfix edits
required.
### Firewall
The mailcow host runs default-deny outbound. Allow outbound 2525/tcp to
SMTP2GO:
```bash
sudo ufw allow out 2525/tcp comment 'mailcow outbound SMTP delivery to SMTP2GO'
sudo ufw reload
```
Allowing outbound 25/465/587 in addition is reasonable so the relay parameters
can be changed without revisiting the firewall.
### Verification
Send a test message from a mailbox on this server to an external address (e.g.
a Gmail account). Then:
1. Check mailcow's logs to confirm the message was handed off to SMTP2GO:
```bash
docker compose logs --tail=200 postfix-mailcow | grep -i smtp2go
```
Look for a `relay=mail.smtp2go.com[...]:2525` line and a `status=sent`.
2. Check the recipient inbox. Examine the message headers - the `Received:`
chain should show SMTP2GO's infrastructure as the immediate upstream.
3. Confirm SPF/DKIM/DMARC pass in the receiving side's headers
(`Authentication-Results:`). SMTP2GO supplies the sending IP's SPF; you must
publish CNAME/TXT records as instructed in the SMTP2GO dashboard so that
DKIM signs as your domain.
### DNS records required for proper authentication
SMTP2GO will generate per-domain CNAME records (DKIM, return-path, tracking) to
add to your authoritative DNS. After publishing them and verifying in the
SMTP2GO dashboard, outbound mail relayed through SMTP2GO will pass SPF, DKIM,
and DMARC at the recipient.
> **Note:** the relay is independent of the cert-sync pipeline. mailcow's
> certificate (managed via the NPM sync described above) secures inbound
> connections from mail clients to this server. The SMTP2GO relay handles
> outbound delivery and uses SMTP2GO's own TLS certificate.
## DNS records
Mail-relevant authoritative DNS records published for `wittenberger.us` (managed
in Cloudflare). Non-mail records (web services, other subdomains) are
intentionally out of scope here.
### Mail delivery & host identity
| Type | Name | Value | Notes |
| ----- | ----------------------------- | ------------------------------------ | ------------------------------------------------ |
| MX | `wittenberger.us` | `mail.wittenberger.us` (pri 10) | Single MX → this server. |
| PTR | `76.18.50.104.in-addr.arpa` | `mail.wittenberger.us` | Reverse DNS for the mail host IP. |
| CNAME | `autoconfig.wittenberger.us` | `mail.wittenberger.us` | Thunderbird-style client autoconfig. |
| CNAME | `autodiscover.wittenberger.us`| `mail.wittenberger.us` | Outlook/Exchange-style autodiscover. |
| SRV | `_autodiscover._tcp.wittenberger.us` | `0 443 mail.wittenberger.us` | SRV-based autodiscover hint. |
### Authentication - SPF / DKIM / DMARC
**SPF** (`wittenberger.us` TXT):
```
v=spf1 a mx a:mail.wittenberger.us -all
```
- `a` / `mx` / `a:mail.wittenberger.us` - authorizes this mail server's own IP
for direct outbound from the root domain.
- `-all` - hard fail for anything else claiming to be `wittenberger.us`.
> **No `include:` for SMTP2GO is required here.** Outbound mail relayed through
> SMTP2GO uses the branded return-path `em1378202.wittenberger.us` (CNAME →
> `return.smtp2go.net`) as the envelope-from. SPF for relayed mail is therefore
> checked against `em1378202.wittenberger.us`, which resolves via CNAME to
> SMTP2GO's SPF, which authorizes their sending IPs. Because the envelope is on
> a subdomain of `wittenberger.us`, DMARC's relaxed alignment (`aspf=r`)
> accepts it. The root SPF stays tightly scoped, and the relay traffic passes
> via the branded subdomain - that is by design.
**DKIM - two signing identities are in play:**
| Selector | Record | Used by |
| -------------------- | -------------------------------------------- | --------- |
| `dkim` | `dkim._domainkey` (TXT, RSA pubkey inline) | mailcow (signs at handoff, before relay) |
| `s1378202` | `s1378202._domainkey` → CNAME `dkim.smtp2go.net` | SMTP2GO (re-signs at delivery) |
`dkim` is the locally-managed mailcow DKIM key. The full public key value lives
in DNS - treat that as the source of truth, not this document.
`s1378202` is SMTP2GO's per-account selector; the CNAME delegates DKIM lookups
to SMTP2GO's infrastructure so the relay can sign outbound mail with a key
aligned to `wittenberger.us`. **Do not delete this CNAME** - removing it breaks
DKIM-aligned signing through the relay and DMARC will fail (because of
`p=reject`).
A companion CNAME at `em1378202.wittenberger.us` → `return.smtp2go.net` provides
the **branded return-path / bounce domain** so envelope-from (`MAIL FROM`) for
relayed mail is on a subdomain of `wittenberger.us`, enabling SPF alignment for
relayed mail.
**DMARC** (`_dmarc.wittenberger.us` TXT):
```
v=DMARC1; p=reject; rua=mailto:6a8f859ff0524737b1db07b99ff7f30c@dmarc-reports.cloudflare.net,mailto:noreply-dmarc@wittenberger.us; ruf=mailto:noreply-dmarc@wittenberger.us; rf=afrf; sp=reject; fo=0; pct=100; ri=86400; adkim=r; aspf=r
```
- `p=reject` / `sp=reject` - recipients should reject failing mail at the
parent and all subdomains.
- `adkim=r` / `aspf=r` - relaxed alignment for both DKIM and SPF.
- `rua` / `ruf` - aggregate and forensic reports.
> With `p=reject`, any auth misalignment causes immediate rejection at strict
> receivers (Gmail, Microsoft). Audit DMARC aggregate reports periodically to
> catch silent breakage.
### Inbound SMTP / DANE
| Type | Name | Value |
| ---- | --------------------------------- | ------------------------------------------------------------------ |
| TLSA | `_25._tcp.mail.wittenberger.us` | `3 1 1 6699fbd6da62e72ea001aeb33f526785e1bae0104c0c74f416ba7d3673284fe5` |
DANE TLSA record for inbound SMTP on port 25, pinning the certificate's public
key (`3` = DANE-EE, `1` = SPKI, `1` = SHA-256 - i.e. SHA-256 of the
end-entity cert's public-key info).
**Why this stays stable across renewals:** the hash pins the *public key*, not
the certificate itself. NPM's ACME client reuses the same keypair across
renewals (only the cert's signature and validity dates change on each renewal),
so the SPKI hash - and therefore the TLSA record - remains valid indefinitely.
To verify the published TLSA still matches the deployed cert:
```bash
openssl x509 -in /opt/mailcow-dockerized/data/assets/ssl/cert.pem \
-noout -pubkey | openssl pkey -pubin -outform DER | sha256sum
```
The hash output must match the third field of the published TLSA record. Worth
spot-checking after major changes (cert pipeline modifications, NPM upgrades,
manual cert reissue with a forced new key).
### When TLSA *would* need updating
If the keypair ever changes - which would happen if NPM is reconfigured to
generate a new key on renewal, the cert is manually reissued with a new CSR,
or you migrate the cert pipeline - then TLSA must be rotated. The safe
overlap pattern:
1. Publish a new TLSA record (with the new SPKI hash) **alongside** the
existing one.
2. Wait at least the old record's TTL for DNS caches to see both.
3. Deploy the new cert.
4. Remove the old TLSA record only after delivery is observed against the new.
Never have zero matching TLSA records during a rotation - that's a hard
delivery failure for DANE-validating senders.
## Monitoring
The chain fails **silently** - a script erroring on a timer only logs to
journald. Recommended safeguards:
- A Wazuh rule watching `mailcow-cert-deploy.service` for non-zero exit / `ERROR`.
- A periodic check that the served cert on `:993` is not within *N* days of
expiry, alerting if a renewal hasn't propagated.
## Migration / rebuild notes
The cert-sync **host-level** components are part of host provisioning, **not**
mailcow data:
- `certsync` user + restricted SSH key
- `deploy-staged-cert.sh`
- both `systemd` units
These are **not** carried by mailcow backup/restore or cold-standby sync. If the
mailcow host is rebuilt or replaced (including an IP-reuse cutover to a new VM),
recreate these on the new host and confirm both timers are active before relying
on automated renewal.