vault-ops/README.md
2026-04-14 11:51:14 +07:00

16 KiB

Vault Host Setup

Scripts and config to run a Vault-backed PKI and secrets lab on a single VPS. It models an enterprise-style layout with an offline root CA, per-environment intermediates, mTLS everywhere, and strict per-app identities.

Highlights

  • Offline root CA kept off the server; Vault only hosts intermediates.
  • Separate environments (test, prod) with their own PKI mounts, TLS dirs, Unix users, and proxy/app users.
  • Vault API over HTTPS and mTLS; client cert auth for admins, agents, and proxies.
  • One Unix user per app/proxy; Vault Agent user services issue and rotate leaf certs and tokens automatically.
  • YAML (infra/config/apps.example.yaml) as tracked source template for envs, PKI mounts, users, CNs, and paths.
  • Rootless containers and systemd user services to keep blast radius small.

Repository layout

  • infra/config/apps.example.yaml - tracked example matrix for environments and apps.
  • infra/*.sh - unversioned wrappers and stable entry points.
  • infra/versions/ - versioned implementations.
  • infra/scripts/ - direct hook scripts used by the agent templates.
  • infra/archiv/ - archived or superseded snapshots.

Script Layout

  • Use the unversioned wrappers in infra/ for normal operation.
  • Implementations live in infra/versions/ and keep their existing version numbers.
  • New scripts without a prior version should start at v1.0.
  • Archived or superseded variants stay in infra/archiv/.

Repo History

  • 112d10d - normalized the infra/ wrapper layout and moved versioned implementations under infra/versions/.
  • 78eff48 - added wrappers for the remaining active versioned scripts.
  • 01016eb - introduced the first stable wrapper entry points.
  • 0685c5f - hardened repo config and removed the leaked token.

Script Catalog

Wrapper Implementation What it does Latest note
infra/00_smoketest.sh infra/versions/00_smoketest-v1.0.sh Minimal environment sanity check. Prints the active user plus VAULT_ENV and VAULT_ADDR.
infra/01_make_offline_root_ca.sh infra/versions/01_make_offline_root_ca-v1.0.sh Creates an offline root CA per environment. Keeps the root key outside Vault and can encrypt it with ROOT_CA_PASSPHRASE.
infra/02_intermediate_in_vault_sign_with_root.sh infra/versions/02_intermediate_in_vault_sign_with_root-v1.0.sh Generates the Vault intermediate CSR, signs it with the offline root, and uploads it. Uses env-specific Vault config and copies the public root CA for distribution.
infra/03_issue_vault_server_cert.sh infra/versions/03_issue_vault_server_cert-v1.0.sh Issues the Vault server certificate and chain. Supports config-driven paths plus ad-hoc overrides for CN, TLS dir, and owner.
infra/04_enable_https_in_compose.sh infra/versions/04_enable_https_in_compose-v1.0.sh Switches the Vault service to HTTPS. Writes config.hcl and a TLS compose file from the generated certs.
infra/05_issue_admin_client_cert.sh infra/versions/05_issue_admin_client_cert-v2.sh Issues the admin mTLS client cert from the intermediate PKI. Wrapper keeps the entry point stable while the implementation stays versioned.
infra/05_issue_admin_client_cert_admin.sh infra/versions/05_issue_admin_client_cert_admin-v1.0.sh Issues admin certs with an already existing admin token. Hardens the TLS env on first run and writes admin material under /root/vault/tls-admin/.
infra/bootstrap-secret-agent.sh infra/versions/bootstrap-secret-agent-v4.2.sh Bootstraps AppRole, KV policy, and cert mapping for secret agents. Idempotent upserts for KV v2, AppRole, and auth/cert mappings.
infra/check_vault_certs.sh infra/versions/check_vault_certs-v1.0.sh Scans a tree of cert files and reports expiry status. Useful for quick certificate inventory checks from the shell.
infra/cleanup-free-pki-roles.sh infra/versions/cleanup-free-pki-roles-v2.sh Removes PKI roles that are safe to delete. Supports dry-run and aggressive cleanup modes.
infra/cleanup-vault-leftovers.sh infra/versions/cleanup-vault-leftovers-v1.0.sh Removes old Vault leftovers such as stale roles and mappings. Defaults to dry-run and keeps proxy-related entries intact.
infra/distribute_ca_to_agents.sh infra/versions/distribute_ca_to_agents-v3.sh Copies root/intermediate/chain trust material to users. Can pull the intermediate directly from Vault or copy a local root/chain.
infra/mtls-rotate.sh infra/versions/mtls-rotate-v1.0.sh Rotates client mTLS certs for agents. Uses the wrapper-based setup script and can restart services automatically.
infra/setup-vault-agent-app-config.sh infra/versions/setup-vault-agent-app-config-v5.1.sh Sets up per-app Vault Agent leaf issuance and rotation. Latest change: multiple KV subpaths per app while keeping earlier policy logic.
infra/setup-vault-agent-mtls-client-config.sh infra/versions/setup-vault-agent-mtls-client-config-v4.7.sh Creates agent mTLS client certs and auth/cert mappings. Latest change: stable CN layout with optional suffix overrides.
infra/setup-vault-agent-proxy-config.sh infra/versions/setup-vault-agent-proxy-config-v4.9.sh Sets up proxy agents that mirror CA chains and reload proxies. Latest change: proxy CA-read policies are attached automatically.
infra/vault-audit-mini-scan.sh infra/versions/vault-audit-mini-scan-v2.sh Scans Vault audit logs for cert login and PKI usage. Matches usage by name, marker policy, policy superset, CN, and display name.
infra/vault-cli infra/versions/vault-cli-v1.0.sh Thin wrapper around the Vault binary. Picks up VAULT_* from the wrapper environment.
infra/vault-inventory-report.sh infra/versions/vault-inventory-report-v1.0.sh Generates an inventory and relation report for Vault. Produces JSON plus optional Graphviz output.
infra/vault-tls-check.sh infra/versions/vault-tls-check-v1.0.sh Verifies Vault TLS and mTLS handshakes. Includes CA inspection mode and sane HOME-based defaults.

Architecture overview

Environments

  • test and prod are fully separated: distinct Vault addresses, PKI mounts (pki-test, pki-prod), TLS directories, Unix owners (vault, vaultprod), and proxy/app users.

PKI chain

  • Offline root CA lives under /root/vault/offline-root/<env>; private key never touches Vault or containers.
  • Vault intermediates are generated inside Vault and signed by the offline root via 02_intermediate_in_vault_sign_with_root.sh.
  • Issuing and CRL URLs are set per env; public root is copied to the Vault TLS dir for trust distribution.

Vault API security

  • Vault server certs are issued from the intermediate with 03_issue_vault_server_cert.sh and stored in /home/<vaultuser>/tls-<env>/.
  • API is consumed via HTTPS with SNI (vault.<env>.privsec.ch) and mTLS for admins and automation; VAULT_TLS_SERVER_NAME and VAULT_CACERT are used by clients.

Identity and access

  • Auth method: auth/cert with one mapping per app/proxy (agent-<app>.<env>.privsec.ch).
  • Policies are scoped per app: pki-issue-<app> plus optional kv-<app> and marker/debug policies. No shared "god" agents.
  • Each app/proxy has its own Unix user and home paths for mTLS material and leaf certs under /home/<user>/vault and /home/<user>/tls.

Agents and automation

  • Vault Agent runs as a systemd user service per app/proxy; it logs in via mTLS, renews, and renders certs.
  • App agents write leaf certs to /home/<app>/tls/<app>.{key,fullchain.pem} and can trigger Podman reloads via labels.
  • Proxy agents keep CA chains in sync (for upstream validation) at /home/proxy*/nginx/ca/current-ca-chain.pem.

Containers and proxies

  • Rootless Podman containers per user; reverse proxies run as dedicated users (proxytest, proxyprod) and mount only their own certs/CA bundles.
  • Compose can be switched to HTTPS once the Vault server cert is issued.

Security concept

  • Trust separation: offline root; per-env intermediates; no root material in Vault or git.
  • Environment isolation: different mounts, TLS dirs, users, and listener ports for test vs prod.
  • Least privilege: one CN -> one auth/cert mapping -> app-specific policies; no cross-app tokens or shared agents.
  • Defense in depth: mTLS on API and agents, per-app Unix users, rootless containers, minimal file permissions, and systemd user scopes.
  • Automation with guardrails: scripts are idempotent, derive paths from YAML, and avoid embedding secrets. Sensitive files stay outside git.
  • Rotation readiness: intermediates can be re-signed from the offline root; agents auto-renew leafs; CA-chain refresh for proxies is automated.

Security review snapshot

  • Architecture (PKI + Vault + mTLS + per-app users): 9/10. Offline root -> Vault intermediate -> server/client certs; separate PKI mounts per env; YAML as config matrix.
  • Implementation (scripts, idempotency, logging): 8/10. Reproducible issuance paths, readable logs, clean agent setup.
  • Operational hardening: 7/10. Needs formal backups/restore, rotation runbooks, monitoring/alerting, and optional HSM/KMS for unseal.

Layered security (inside-out)

  • PKI layer: offline root at /root/vault/offline-root/{test,prod}/root-ca.{key,pem,srl}; intermediates generated in Vault and signed offline; URLs set per env; CA chain = intermediate + root.
  • Vault API + mTLS: Vault runs as dedicated Unix users (vault, vaultprod); HTTPS termination with env-specific server cert/chain; admin access via VAULT_ADDR=https://127.0.0.1:22{300,400}, VAULT_TLS_SERVER_NAME=vault.{env}.privsec.ch, VAULT_CACERT=/root/vault/tls-{env}/ca_chain.pem; admin mTLS via VAULT_CLIENT_CERT/VAULT_CLIENT_KEY.
  • Agent/App layer: one Unix user per app/proxy; mTLS material under /home/<user>/vault/mtls plus trust at /home/<user>/vault/ca/ca.pem; Vault Agent as systemd user service with cert auth mapping auth/cert/certs/agent-<app>; policies pki-issue-<app> (+ optional KV, marker/debug); leaf certs rendered to /home/<user>/tls/<app>.{key,fullchain.pem}.
  • Proxy layer: dedicated users (proxytest, proxyprod) with CA trust at /home/proxy*/vault/ca/ca.pem; proxy agents keep /home/proxy*/nginx/ca/current-ca-chain.pem synced and reload containers via labels (tls=true). Pattern maps cleanly to k8s sidecar + secret volume + annotations.

Hardening backlog

  • Admin material: keep tokens/init files only under /root; issue short-lived admin tokens and revoke after use.
  • Recovery: define backup/restore for Vault storage and offline root/intermediate; add a documented disaster runbook.
  • Safety rails: avoid VAULT_SKIP_VERIFY=1 outside one-off scripts; remove or trim debug-policy in prod.
  • Rotation: plan root rollover (date-bound), intermediate renewal cadence, and SLA for leaf lifetime/renewal windows.
  • Monitoring: enable Vault audit logs to file/syslog and ship to SIEM (Graylog/Loki/ELK); alert on many 403s, unusual cert-auth logins, and policy changes.
  • Optional: HSM/KMS for unseal/autounseal (see Azure auto-unseal below) if targeting enterprise-grade resilience.

Azure auto-unseal (optional)

  • Purpose: remove manual unseal steps while keeping unseal keys in Azure Key Vault (wrap/unwrap). Reduces operational toil but adds cloud dependency.
  • Prereqs: Azure subscription, Key Vault with an RSA key (HSM-backed preferred), service principal with get, wrapKey, unwrapKey permissions on that key.
  • Vault config snippet (vault.hcl or compose template):
seal "azurekeyvault" {
  tenant_id     = "<AZURE_TENANT_ID>"
  client_id     = "<AZURE_CLIENT_ID>"      # service principal
  client_secret = "<AZURE_CLIENT_SECRET>"  # store via env/secret store
  vault_name    = "<key-vault-name>"
  key_name      = "vault-autounseal"
  # key_version = "<optional-fixed-version>"
}
  • Systemd/compose: set env vars (AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_KEY_VAULT_NAME, AZURE_KEY_NAME) in your unit/env file and template into vault.hcl.
  • Notes: keep recovery keys and initial root tokens safe even with auto-unseal; ensure outbound HTTPS to Azure; monitor Key Vault access logs and alert on wrap/unwrap anomalies.

Repo and runtime layout

  • Git repo (under /home/<admin>/vault/): infra/ scripts and policies, infra/config/apps.example.yaml as the tracked template, infra/config/apps.yaml as your private copy, policy/, plus a bootstrap copy of offline-root/ (not used at runtime) and secrets/ for init/unseal/admin tokens (never commit).
  • Runtime offline root: /root/vault/offline-root/<env>/ holding openssl.cnf, root-ca.key, root-ca.pem, root-ca.srl (authoritative source).
  • Vault server TLS per env: /home/vault{,prod}/tls-{test,prod}/ with server.key, server.crt, fullchain.crt, ca_chain.pem; public root copies at /home/vault/tls-{env}/root_ca.pem.
  • Admin mTLS (public paths): /root/vault/tls-admin/{test,prod}/ for admin client certs; admin tokens under /root/vault/tokens/.
  • Per-app/proxy homes: /home/<user>/vault/{mtls,ca,...} for agent identity and trust; /home/<user>/tls/<app>.{key,fullchain.pem} for leafs; agent tokens under /home/<user>/.vault-agent-*/token with ~/.vault-token symlink for CLI convenience.

Local Config

  • Copy infra/config/apps.example.yaml to infra/config/apps.yaml on your machine and edit the local copy only.
  • Keep infra/config/apps.yaml untracked. It is intentionally ignored so your private hostnames, users, and seed values stay local.

Quick start (test environment example)

  1. Prerequisites: Vault CLI, jq, python3 with PyYAML, openssl, sudo, podman (for services), and systemd user sessions.

  2. Create your local config from the example if you have not done that yet:

cp infra/config/apps.example.yaml infra/config/apps.yaml
  1. Create offline root CA (keeps key off Vault):
cd infra
./01_make_offline_root_ca.sh --env test
# repeat with --env prod when ready
  1. Create and sign the Vault intermediate with the offline root (requires admin token in VAULT_TOKEN):
VAULT_ADDR=http://127.0.0.1:22300 \
VAULT_TOKEN=hvs.<admin> \
./02_intermediate_in_vault_sign_with_root.sh --env test --api http://127.0.0.1:22300
  1. Issue the Vault server cert and chain (writes to the env TLS dir):
VAULT_TOKEN=hvs.<admin> \
./03_issue_vault_server_cert.sh --env test --config ./config/apps.yaml \
  --cn vault.test.privsec.ch --dns "vault.test.privsec.ch,localhost" --ips "127.0.0.1,::1"
  1. Switch compose/services to HTTPS if needed:
./04_enable_https_in_compose.sh --env test
  1. Bootstrap an app identity and agent (example n8ndev in test):
./setup-vault-agent-mtls-client-config.sh --env test --app n8ndev
./setup-vault-agent-app-config.sh --env test --app n8ndev
  1. For proxies (example prod):
./setup-vault-agent-mtls-client-config.sh --env prod --app proxyprod
./setup-vault-agent-proxy-config.sh --env prod --app proxyprod
  1. Distribute CA chains to new users if needed:
./distribute_ca_to_agents.sh --env test --which chain --users "n8ndev proxytest"
  1. Validate: run infra/00_smoketest.sh, infra/vault-tls-check.sh, or infra/vault-audit-mini-scan.sh to confirm mTLS and policy wiring.

Operational notes

  • Secrets and keys are not committed; only scripts and structure live in git. Offline root material stays under /root/vault/offline-root/<env>.
  • Keep backups of Vault storage, root/intermediate certs, and admin tokens. Define a recovery runbook for full node loss.
  • Plan rotations: root CA rollover schedule, intermediate renewal cadence, and SLAs for leaf cert lifetimes.
  • Enable Vault audit logging and forward to your SIEM; watch for unusual auth/cert logins and policy changes.

Author

Built by Blade34242 Use issues/PRs if you want to discuss improvements or report problems.