Files
launchpad/apps/overlays/upc-dev/forte-drop-postgresql/RESTORE.md
Jørgen Stensrud b713ec853c feat(apps): forte-drop web + mcp argocd apps (prod) (#18)
## Summary

ArgoCD Applications + Keycloak clients + sealed secret for forte-drop **web + mcp** (PROD).

## What changed

- **forte-drop** + **forte-drop-mcp** ArgoCD Applications (two-source: forte-helm chart + helm-prod-values).
- **namespace.yaml** — explicit `forte-drop` Namespace at sync-wave -1, `Prune=false` (avoids first-sync race for namespaced resources; doesn't cascade-delete on base removal).
- **keycloak-client-forte-drop** + **keycloak-client-forte-drop-mcp** — labeled config Secrets; the registrar creates the OIDC clients in the `forte` realm within ~2 min.
- **forte-drop-secrets** SealedSecret — UpCloud S3 creds (existing drops bucket) + PG creds + PASSWORD_GATE_SECRET. Consumed by both deployments + the pg-backup CronJob.
- **forte-drop-web PDB** — minAvailable 1 (selector verified against the live forteapp chart's pod labels).
- Wired into `apps/overlays/upc-dev` (NOT base → stays out of upc-prod).

## Post-merge manual step (one-time)

`auth-oidc` SealedSecret for the web sidecar is still commented out — it needs the `client-secret` the Keycloak registrar writes to `forte-drop-oidc-credentials` after first sync:

```bash
CLIENT_SECRET=$(kubectl -n forte-drop get secret forte-drop-oidc-credentials -o jsonpath='{.data.client-secret}' | base64 -d)
kubectl create secret generic auth-oidc -n forte-drop \
  --from-literal=client-secret="$CLIENT_SECRET" \
  --from-literal=cookie-secret="$(openssl rand -hex 32)" \
  --dry-run=client -o yaml > private/auth-oidc.yaml
kubeseal --format=yaml --controller-name=sealed-secrets-controller --controller-namespace=kube-system \
  < private/auth-oidc.yaml > apps/base/forte-drop/auth-oidc-sealed.yaml
# uncomment in kustomization, commit, push
```

## Depends on

- launchpad PR #17 (postgres + namespace via CreateNamespace).
- helm-prod-values forte-drop PR (values).

## Review

- [x] codex: namespace first-sync race → fixed (explicit namespace, sync-wave -1).
- [x] Keycloak registrar unblocked (stale chibisafe/minio config secrets removed; registrar green).

🤖 Generated with Claude Code

Co-authored-by: Sten <sten@Sten-sin-MacBook-Pro.local>
Co-authored-by: Sten <sten@Mac.domain_not_set.invalid>
Co-authored-by: Danijel Simeunovic <danijel.simeunovic@fortedigital.com>
Reviewed-on: #18
Reviewed-by: Danijel Simeunovic <danijel.simeunovic@fortedigital.com>
2026-06-04 18:47:08 +00:00

144 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# forte-drop Postgres — backup & restore runbook
## What gets backed up
A CronJob (`forte-drop-pg-backup`, namespace `forte-drop`) runs nightly at **02:00 UTC**:
1. `pg_dump` of the `drops` database → gzip.
2. Upload to **UpCloud Managed Object Storage**: `s3://drops/_pgbackups/forte-drop-<TS>.sql.gz`
(the `_pgbackups/` prefix is collision-proof: app slugs match `/^[a-z0-9][a-z0-9-]{0,62}$/`
and can never start with `_`).
3. Retention: dumps older than **30 days** are pruned.
S3 creds come from the `forte-drop-secrets` Secret (`S3_ENDPOINT` / `S3_KEY` / `S3_SECRET`).
Postgres creds from `forte-drop-pg-creds` (`pgusername` / `pgpassword`).
> **Object storage is the durable tier.** App data + DB backups both live in UpCloud
> Managed Object Storage (replicated by UpCloud). The in-cluster Postgres PVC is the
> live working copy; the nightly dump is the recovery point. The PVC carries
> `Prune=false,Delete=false` so ArgoCD never deletes it.
## Prerequisites
```bash
export KUBECONFIG=~/Downloads/dev-fd-no-svg1_kubeconfig.yaml
# Confirm the namespace + DB pod are up:
kubectl -n forte-drop get pods -l app.kubernetes.io/name=postgresql
```
## List available backups
```bash
# Run an ephemeral mc pod with the app's S3 creds:
kubectl -n forte-drop run mc-list --rm -it --restart=Never \
--image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
--overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc ls obj/drops/_pgbackups/"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}'
```
## Manually trigger a backup (before risky changes)
```bash
kubectl -n forte-drop create job --from=cronjob/forte-drop-pg-backup pg-backup-manual-$(date +%s)
# Watch:
kubectl -n forte-drop get jobs -l app.kubernetes.io/component=backup
kubectl -n forte-drop logs -l app.kubernetes.io/component=backup --tail=40
```
## Restore a dump
> **Destructive.** This overwrites the live `drops` database. Take a fresh manual
> backup first (above) and confirm with whoever owns the data before proceeding.
### 1. Pick the dump to restore
List backups (above), choose `forte-drop-<TS>.sql.gz`.
### 2. Run a restore pod that pulls the dump and pipes it into Postgres
```bash
DUMP="forte-drop-20260530T020000Z.sql.gz" # <-- set to the chosen file
kubectl -n forte-drop run pg-restore --rm -it --restart=Never \
--image=postgres:16-alpine \
--overrides='{
"spec": {
"containers": [{
"name": "restore",
"image": "postgres:16-alpine",
"command": ["sh","-c","set -euo pipefail; \
apk add --no-cache curl >/dev/null; \
# download via mc is simpler — use a 2-step instead (see note). \
echo placeholder"],
"envFrom": [
{"secretRef":{"name":"forte-drop-pg-creds"}},
{"secretRef":{"name":"forte-drop-secrets"}}
]
}]
}
}'
```
**Simpler 2-pod approach (recommended — avoids cramming mc + psql in one image):**
```bash
DUMP="forte-drop-20260530T020000Z.sql.gz"
# (a) Download the dump from object storage to a local file:
kubectl -n forte-drop run mc-get --rm -it --restart=Never \
--image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
--overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc cat obj/drops/_pgbackups/'"$DUMP"'"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}' \
> /tmp/$DUMP
# (b) Pipe it into the live Postgres via the service:
gunzip -c /tmp/$DUMP | kubectl -n forte-drop run pg-restore --rm -i --restart=Never \
--image=postgres:16-alpine \
--overrides='{"spec":{"containers":[{"name":"psql","image":"postgres:16-alpine","stdin":true,"command":["sh","-c","PGPASSWORD=\"$pgpassword\" psql -h forte-drop-postgresql.forte-drop.svc -U \"$pgusername\" -d drops"],"env":[{"name":"pgusername","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgusername"}}},{"name":"pgpassword","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgpassword"}}}]}]}}'
```
> The app's schema is created idempotently on boot (`CREATE TABLE IF NOT EXISTS` +
> `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` in `src/repo/pg.ts`), and `pg_dump`
> output includes the data. For a clean restore into a fresh DB this just works.
> To restore over an existing DB with conflicting rows, drop/recreate the `drops`
> database first (coordinate downtime — scale the web Deployment to 0 during the
> restore so the app isn't writing).
### 3. Verify
```bash
kubectl -n forte-drop run pg-check --rm -it --restart=Never \
--image=postgres:16-alpine \
--env="PGPASSWORD=$(kubectl -n forte-drop get secret forte-drop-pg-creds -o jsonpath='{.data.pgpassword}' | base64 -d)" \
--command -- psql -h forte-drop-postgresql.forte-drop.svc -U drops -d drops \
-c "SELECT count(*) AS drops FROM drops;" -c "SELECT count(*) AS view_hits FROM view_hits;"
```
### 4. Bring the app back
```bash
# If you scaled web to 0 for the restore:
kubectl -n forte-drop scale deploy/forte-drop --replicas=2
```
## Object data (uploaded drop files)
Drop files live in `s3://drops/<slug>/...` in the same managed bucket. They are
**not** part of the pg backup (the dump only holds metadata). Object storage is
UpCloud-managed/replicated, so no separate file backup is configured. If a
file-level backup is later required, mirror the bucket to a second bucket/region:
```bash
mc mirror --overwrite obj/drops/ backup-target/drops-mirror/
```
(Exclude `_pgbackups/` from the app-data mirror if you split them.)
## Disaster scenarios
| Scenario | Recovery |
|---|---|
| Postgres pod crash / reschedule | StatefulSet reattaches the PVC; ~12 min downtime; no data loss. |
| PVC lost / corrupted | Recreate StatefulSet, restore latest nightly dump (above). Data since last dump is lost. |
| Accidental `drops` table data loss | Restore latest dump; or `pg_restore` a single table from a dump. |
| Namespace deleted | PVC has `Prune=false,Delete=false`; recreate Applications, PVC re-binds, app recovers. Backups in object storage are independent. |
| Object storage bucket lost | UpCloud-managed (replicated). If the IAM key is rotated, update `forte-drop-secrets` (re-seal). |