Files
launchpad/apps/overlays/upc-dev/forte-drop-postgresql/RESTORE.md
Jørgen Stensrud b713ec853c feat(apps): forte-drop web + mcp argocd apps (prod) (#18)
## Summary

ArgoCD Applications + Keycloak clients + sealed secret for forte-drop **web + mcp** (PROD).

## What changed

- **forte-drop** + **forte-drop-mcp** ArgoCD Applications (two-source: forte-helm chart + helm-prod-values).
- **namespace.yaml** — explicit `forte-drop` Namespace at sync-wave -1, `Prune=false` (avoids first-sync race for namespaced resources; doesn't cascade-delete on base removal).
- **keycloak-client-forte-drop** + **keycloak-client-forte-drop-mcp** — labeled config Secrets; the registrar creates the OIDC clients in the `forte` realm within ~2 min.
- **forte-drop-secrets** SealedSecret — UpCloud S3 creds (existing drops bucket) + PG creds + PASSWORD_GATE_SECRET. Consumed by both deployments + the pg-backup CronJob.
- **forte-drop-web PDB** — minAvailable 1 (selector verified against the live forteapp chart's pod labels).
- Wired into `apps/overlays/upc-dev` (NOT base → stays out of upc-prod).

## Post-merge manual step (one-time)

`auth-oidc` SealedSecret for the web sidecar is still commented out — it needs the `client-secret` the Keycloak registrar writes to `forte-drop-oidc-credentials` after first sync:

```bash
CLIENT_SECRET=$(kubectl -n forte-drop get secret forte-drop-oidc-credentials -o jsonpath='{.data.client-secret}' | base64 -d)
kubectl create secret generic auth-oidc -n forte-drop \
  --from-literal=client-secret="$CLIENT_SECRET" \
  --from-literal=cookie-secret="$(openssl rand -hex 32)" \
  --dry-run=client -o yaml > private/auth-oidc.yaml
kubeseal --format=yaml --controller-name=sealed-secrets-controller --controller-namespace=kube-system \
  < private/auth-oidc.yaml > apps/base/forte-drop/auth-oidc-sealed.yaml
# uncomment in kustomization, commit, push
```

## Depends on

- launchpad PR #17 (postgres + namespace via CreateNamespace).
- helm-prod-values forte-drop PR (values).

## Review

- [x] codex: namespace first-sync race → fixed (explicit namespace, sync-wave -1).
- [x] Keycloak registrar unblocked (stale chibisafe/minio config secrets removed; registrar green).

🤖 Generated with Claude Code

Co-authored-by: Sten <sten@Sten-sin-MacBook-Pro.local>
Co-authored-by: Sten <sten@Mac.domain_not_set.invalid>
Co-authored-by: Danijel Simeunovic <danijel.simeunovic@fortedigital.com>
Reviewed-on: #18
Reviewed-by: Danijel Simeunovic <danijel.simeunovic@fortedigital.com>
2026-06-04 18:47:08 +00:00

6.3 KiB
Raw Blame History

forte-drop Postgres — backup & restore runbook

What gets backed up

A CronJob (forte-drop-pg-backup, namespace forte-drop) runs nightly at 02:00 UTC:

  1. pg_dump of the drops database → gzip.
  2. Upload to UpCloud Managed Object Storage: s3://drops/_pgbackups/forte-drop-<TS>.sql.gz (the _pgbackups/ prefix is collision-proof: app slugs match /^[a-z0-9][a-z0-9-]{0,62}$/ and can never start with _).
  3. Retention: dumps older than 30 days are pruned.

S3 creds come from the forte-drop-secrets Secret (S3_ENDPOINT / S3_KEY / S3_SECRET). Postgres creds from forte-drop-pg-creds (pgusername / pgpassword).

Object storage is the durable tier. App data + DB backups both live in UpCloud Managed Object Storage (replicated by UpCloud). The in-cluster Postgres PVC is the live working copy; the nightly dump is the recovery point. The PVC carries Prune=false,Delete=false so ArgoCD never deletes it.

Prerequisites

export KUBECONFIG=~/Downloads/dev-fd-no-svg1_kubeconfig.yaml
# Confirm the namespace + DB pod are up:
kubectl -n forte-drop get pods -l app.kubernetes.io/name=postgresql

List available backups

# Run an ephemeral mc pod with the app's S3 creds:
kubectl -n forte-drop run mc-list --rm -it --restart=Never \
  --image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
  --overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc ls obj/drops/_pgbackups/"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}'

Manually trigger a backup (before risky changes)

kubectl -n forte-drop create job --from=cronjob/forte-drop-pg-backup pg-backup-manual-$(date +%s)
# Watch:
kubectl -n forte-drop get jobs -l app.kubernetes.io/component=backup
kubectl -n forte-drop logs -l app.kubernetes.io/component=backup --tail=40

Restore a dump

Destructive. This overwrites the live drops database. Take a fresh manual backup first (above) and confirm with whoever owns the data before proceeding.

1. Pick the dump to restore

List backups (above), choose forte-drop-<TS>.sql.gz.

2. Run a restore pod that pulls the dump and pipes it into Postgres

DUMP="forte-drop-20260530T020000Z.sql.gz"   # <-- set to the chosen file

kubectl -n forte-drop run pg-restore --rm -it --restart=Never \
  --image=postgres:16-alpine \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "restore",
        "image": "postgres:16-alpine",
        "command": ["sh","-c","set -euo pipefail; \
          apk add --no-cache curl >/dev/null; \
          # download via mc is simpler — use a 2-step instead (see note). \
          echo placeholder"],
        "envFrom": [
          {"secretRef":{"name":"forte-drop-pg-creds"}},
          {"secretRef":{"name":"forte-drop-secrets"}}
        ]
      }]
    }
  }'

Simpler 2-pod approach (recommended — avoids cramming mc + psql in one image):

DUMP="forte-drop-20260530T020000Z.sql.gz"

# (a) Download the dump from object storage to a local file:
kubectl -n forte-drop run mc-get --rm -it --restart=Never \
  --image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
  --overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc cat obj/drops/_pgbackups/'"$DUMP"'"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}' \
  > /tmp/$DUMP

# (b) Pipe it into the live Postgres via the service:
gunzip -c /tmp/$DUMP | kubectl -n forte-drop run pg-restore --rm -i --restart=Never \
  --image=postgres:16-alpine \
  --overrides='{"spec":{"containers":[{"name":"psql","image":"postgres:16-alpine","stdin":true,"command":["sh","-c","PGPASSWORD=\"$pgpassword\" psql -h forte-drop-postgresql.forte-drop.svc -U \"$pgusername\" -d drops"],"env":[{"name":"pgusername","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgusername"}}},{"name":"pgpassword","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgpassword"}}}]}]}}'

The app's schema is created idempotently on boot (CREATE TABLE IF NOT EXISTS + ALTER TABLE ... ADD COLUMN IF NOT EXISTS in src/repo/pg.ts), and pg_dump output includes the data. For a clean restore into a fresh DB this just works. To restore over an existing DB with conflicting rows, drop/recreate the drops database first (coordinate downtime — scale the web Deployment to 0 during the restore so the app isn't writing).

3. Verify

kubectl -n forte-drop run pg-check --rm -it --restart=Never \
  --image=postgres:16-alpine \
  --env="PGPASSWORD=$(kubectl -n forte-drop get secret forte-drop-pg-creds -o jsonpath='{.data.pgpassword}' | base64 -d)" \
  --command -- psql -h forte-drop-postgresql.forte-drop.svc -U drops -d drops \
  -c "SELECT count(*) AS drops FROM drops;" -c "SELECT count(*) AS view_hits FROM view_hits;"

4. Bring the app back

# If you scaled web to 0 for the restore:
kubectl -n forte-drop scale deploy/forte-drop --replicas=2

Object data (uploaded drop files)

Drop files live in s3://drops/<slug>/... in the same managed bucket. They are not part of the pg backup (the dump only holds metadata). Object storage is UpCloud-managed/replicated, so no separate file backup is configured. If a file-level backup is later required, mirror the bucket to a second bucket/region:

mc mirror --overwrite obj/drops/ backup-target/drops-mirror/

(Exclude _pgbackups/ from the app-data mirror if you split them.)

Disaster scenarios

Scenario Recovery
Postgres pod crash / reschedule StatefulSet reattaches the PVC; ~12 min downtime; no data loss.
PVC lost / corrupted Recreate StatefulSet, restore latest nightly dump (above). Data since last dump is lost.
Accidental drops table data loss Restore latest dump; or pg_restore a single table from a dump.
Namespace deleted PVC has Prune=false,Delete=false; recreate Applications, PVC re-binds, app recovers. Backups in object storage are independent.
Object storage bucket lost UpCloud-managed (replicated). If the IAM key is rotated, update forte-drop-secrets (re-seal).