All checks were successful
AI Code Review / ai-review (pull_request) Successful in 6s
forte-drop, forte-drop-mcp and forte-drop-postgresql lived under apps/base/ but were only ever wired into the upc-dev overlay (never listed in apps/base/kustomization.yaml). They carry hackathon-domain hardcoded values and must not sync to upc-prod, so they belong in the overlay alongside dbunk-demo — per danijel.simeunovic's review on PR #18. - git mv the three dirs into apps/overlays/upc-dev/ (history preserved) - rewrite overlay kustomization refs from ../../base/forte-drop* to local - repoint forte-drop-postgresql Application path apps/base/... -> apps/overlays/upc-dev/forte-drop-postgresql/resources Render-verified: kubectl kustomize apps/overlays/upc-dev differs only by the postgres path line; apps/overlays/upc-prod render byte-identical (forte-drop never reaches prod). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
144 lines
6.3 KiB
Markdown
144 lines
6.3 KiB
Markdown
# forte-drop Postgres — backup & restore runbook
|
||
|
||
## What gets backed up
|
||
|
||
A CronJob (`forte-drop-pg-backup`, namespace `forte-drop`) runs nightly at **02:00 UTC**:
|
||
|
||
1. `pg_dump` of the `drops` database → gzip.
|
||
2. Upload to **UpCloud Managed Object Storage**: `s3://drops/_pgbackups/forte-drop-<TS>.sql.gz`
|
||
(the `_pgbackups/` prefix is collision-proof: app slugs match `/^[a-z0-9][a-z0-9-]{0,62}$/`
|
||
and can never start with `_`).
|
||
3. Retention: dumps older than **30 days** are pruned.
|
||
|
||
S3 creds come from the `forte-drop-secrets` Secret (`S3_ENDPOINT` / `S3_KEY` / `S3_SECRET`).
|
||
Postgres creds from `forte-drop-pg-creds` (`pgusername` / `pgpassword`).
|
||
|
||
> **Object storage is the durable tier.** App data + DB backups both live in UpCloud
|
||
> Managed Object Storage (replicated by UpCloud). The in-cluster Postgres PVC is the
|
||
> live working copy; the nightly dump is the recovery point. The PVC carries
|
||
> `Prune=false,Delete=false` so ArgoCD never deletes it.
|
||
|
||
## Prerequisites
|
||
|
||
```bash
|
||
export KUBECONFIG=~/Downloads/dev-fd-no-svg1_kubeconfig.yaml
|
||
# Confirm the namespace + DB pod are up:
|
||
kubectl -n forte-drop get pods -l app.kubernetes.io/name=postgresql
|
||
```
|
||
|
||
## List available backups
|
||
|
||
```bash
|
||
# Run an ephemeral mc pod with the app's S3 creds:
|
||
kubectl -n forte-drop run mc-list --rm -it --restart=Never \
|
||
--image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
|
||
--overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc ls obj/drops/_pgbackups/"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}'
|
||
```
|
||
|
||
## Manually trigger a backup (before risky changes)
|
||
|
||
```bash
|
||
kubectl -n forte-drop create job --from=cronjob/forte-drop-pg-backup pg-backup-manual-$(date +%s)
|
||
# Watch:
|
||
kubectl -n forte-drop get jobs -l app.kubernetes.io/component=backup
|
||
kubectl -n forte-drop logs -l app.kubernetes.io/component=backup --tail=40
|
||
```
|
||
|
||
## Restore a dump
|
||
|
||
> **Destructive.** This overwrites the live `drops` database. Take a fresh manual
|
||
> backup first (above) and confirm with whoever owns the data before proceeding.
|
||
|
||
### 1. Pick the dump to restore
|
||
|
||
List backups (above), choose `forte-drop-<TS>.sql.gz`.
|
||
|
||
### 2. Run a restore pod that pulls the dump and pipes it into Postgres
|
||
|
||
```bash
|
||
DUMP="forte-drop-20260530T020000Z.sql.gz" # <-- set to the chosen file
|
||
|
||
kubectl -n forte-drop run pg-restore --rm -it --restart=Never \
|
||
--image=postgres:16-alpine \
|
||
--overrides='{
|
||
"spec": {
|
||
"containers": [{
|
||
"name": "restore",
|
||
"image": "postgres:16-alpine",
|
||
"command": ["sh","-c","set -euo pipefail; \
|
||
apk add --no-cache curl >/dev/null; \
|
||
# download via mc is simpler — use a 2-step instead (see note). \
|
||
echo placeholder"],
|
||
"envFrom": [
|
||
{"secretRef":{"name":"forte-drop-pg-creds"}},
|
||
{"secretRef":{"name":"forte-drop-secrets"}}
|
||
]
|
||
}]
|
||
}
|
||
}'
|
||
```
|
||
|
||
**Simpler 2-pod approach (recommended — avoids cramming mc + psql in one image):**
|
||
|
||
```bash
|
||
DUMP="forte-drop-20260530T020000Z.sql.gz"
|
||
|
||
# (a) Download the dump from object storage to a local file:
|
||
kubectl -n forte-drop run mc-get --rm -it --restart=Never \
|
||
--image=quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z \
|
||
--overrides='{"spec":{"containers":[{"name":"mc","image":"quay.io/minio/mc:RELEASE.2024-11-21T17-21-54Z","command":["sh","-c","mc alias set obj \"$S3_ENDPOINT\" \"$S3_KEY\" \"$S3_SECRET\" >/dev/null && mc cat obj/drops/_pgbackups/'"$DUMP"'"],"envFrom":[{"secretRef":{"name":"forte-drop-secrets"}}]}]}}' \
|
||
> /tmp/$DUMP
|
||
|
||
# (b) Pipe it into the live Postgres via the service:
|
||
gunzip -c /tmp/$DUMP | kubectl -n forte-drop run pg-restore --rm -i --restart=Never \
|
||
--image=postgres:16-alpine \
|
||
--overrides='{"spec":{"containers":[{"name":"psql","image":"postgres:16-alpine","stdin":true,"command":["sh","-c","PGPASSWORD=\"$pgpassword\" psql -h forte-drop-postgresql.forte-drop.svc -U \"$pgusername\" -d drops"],"env":[{"name":"pgusername","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgusername"}}},{"name":"pgpassword","valueFrom":{"secretKeyRef":{"name":"forte-drop-pg-creds","key":"pgpassword"}}}]}]}}'
|
||
```
|
||
|
||
> The app's schema is created idempotently on boot (`CREATE TABLE IF NOT EXISTS` +
|
||
> `ALTER TABLE ... ADD COLUMN IF NOT EXISTS` in `src/repo/pg.ts`), and `pg_dump`
|
||
> output includes the data. For a clean restore into a fresh DB this just works.
|
||
> To restore over an existing DB with conflicting rows, drop/recreate the `drops`
|
||
> database first (coordinate downtime — scale the web Deployment to 0 during the
|
||
> restore so the app isn't writing).
|
||
|
||
### 3. Verify
|
||
|
||
```bash
|
||
kubectl -n forte-drop run pg-check --rm -it --restart=Never \
|
||
--image=postgres:16-alpine \
|
||
--env="PGPASSWORD=$(kubectl -n forte-drop get secret forte-drop-pg-creds -o jsonpath='{.data.pgpassword}' | base64 -d)" \
|
||
--command -- psql -h forte-drop-postgresql.forte-drop.svc -U drops -d drops \
|
||
-c "SELECT count(*) AS drops FROM drops;" -c "SELECT count(*) AS view_hits FROM view_hits;"
|
||
```
|
||
|
||
### 4. Bring the app back
|
||
|
||
```bash
|
||
# If you scaled web to 0 for the restore:
|
||
kubectl -n forte-drop scale deploy/forte-drop --replicas=2
|
||
```
|
||
|
||
## Object data (uploaded drop files)
|
||
|
||
Drop files live in `s3://drops/<slug>/...` in the same managed bucket. They are
|
||
**not** part of the pg backup (the dump only holds metadata). Object storage is
|
||
UpCloud-managed/replicated, so no separate file backup is configured. If a
|
||
file-level backup is later required, mirror the bucket to a second bucket/region:
|
||
|
||
```bash
|
||
mc mirror --overwrite obj/drops/ backup-target/drops-mirror/
|
||
```
|
||
|
||
(Exclude `_pgbackups/` from the app-data mirror if you split them.)
|
||
|
||
## Disaster scenarios
|
||
|
||
| Scenario | Recovery |
|
||
|---|---|
|
||
| Postgres pod crash / reschedule | StatefulSet reattaches the PVC; ~1–2 min downtime; no data loss. |
|
||
| PVC lost / corrupted | Recreate StatefulSet, restore latest nightly dump (above). Data since last dump is lost. |
|
||
| Accidental `drops` table data loss | Restore latest dump; or `pg_restore` a single table from a dump. |
|
||
| Namespace deleted | PVC has `Prune=false,Delete=false`; recreate Applications, PVC re-binds, app recovers. Backups in object storage are independent. |
|
||
| Object storage bucket lost | UpCloud-managed (replicated). If the IAM key is rotated, update `forte-drop-secrets` (re-seal). |
|