Files
launchpad/docs/GITOPS-ARCHITECTURE.md
2026-04-19 13:47:29 +02:00

664 lines
25 KiB
Markdown

# GitOps Architecture & Repository Guide
## Table of Contents
- [Overview](#overview)
- [Architecture Diagram](#architecture-diagram)
- [Repository Structure](#repository-structure)
- [GitOps Workflow](#gitops-workflow)
- [CI/CD Pipeline](#cicd-pipeline)
- [Security Model](#security-model)
---
## Overview
This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where Git repositories serve as the single source of truth for both infrastructure and application deployments. The cluster is running on **UpCloud Managed Kubernetes** but is designed to be cloud-agnostic.
### Key Characteristics
- **Environment**: Production (internal use only)
- **Cluster Type**: Multi-cluster (upc-dev, upc-prod) via Kustomize overlays
- **GitOps Tool**: ArgoCD
- **Deployment Pattern**: App-of-Apps
- **Secret Management**: Sealed Secrets (kubeseal)
- **Ingress**: Traefik with Let's Encrypt TLS
- **Monitoring**: Prometheus + Grafana + Loki + Tempo + Fluent-Bit
- **Policy Engine**: Kyverno
- **Notifications**: Slack integration for sync status
---
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Developer Workflow │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Application Code │ │ Helm Charts │ │ Helm Values │
│ Repositories │──────│ Repository │──────│ Repository │
│ (Source Code) │ │ (Templates) │ │ (Config/Env) │
└─────────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
│ │ │
GitHub Actions │ │
Build & Push Image │ │
│ │ │
│ │ │
└────────► Update image tag ─┴──────────────────────────┘
in helm-prod-values │
┌────────────────────────────────┐
│ Config Repository │
│ (ArgoCD Applications) │
│ git.forteapps.net/Forte/ │
│ launchpad │
└────────────────────────────────┘
ArgoCD monitors & syncs
┌────────────────────────────────┐
│ Kubernetes Clusters │
│ (UpCloud: upc-dev, upc-prod) │
│ │
│ ┌──────────────────────────┐ │
│ │ ArgoCD │ │
│ │ (GitOps Controller) │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Infrastructure Layer │ │
│ │ - Traefik (Ingress) │ │
│ │ - Cert-Manager (TLS) │ │
│ │ - Kyverno (Policies) │ │
│ │ - Sealed Secrets │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Monitoring Stack │ │
│ │ - Prometheus │ │
│ │ - Grafana │ │
│ │ - Loki │ │
│ │ - Tempo │ │
│ │ - Fluent-Bit │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Application Layer │ │
│ │ - mcp10x │ │
│ │ - musicman │ │
│ │ - dot-ai-stack │ │
│ │ - argo-mcp │ │
│ └──────────────────────────┘ │
└────────────────────────────────┘
┌──────────────────┐
│ Slack Channel │
│ (Notifications) │
└──────────────────┘
```
---
## Repository Structure
### 1. **Config Repository** (Current Repo)
**Repository**: `https://git.forteapps.net/Forte/launchpad`
**Purpose**: GitOps configuration - ArgoCD Applications and cluster resources
**Location**: `C:\dev\k8s\launchpad`
```
launchpad/
├── bootstrap.sh # Cluster initialization script
├── _app-of-apps-upc-dev.yaml # Root ArgoCD Application (upc-dev cluster)
├── _app-of-apps-upc-prod.yaml # Root ArgoCD Application (upc-prod cluster)
├── infra/ # Infrastructure ArgoCD Applications (Kustomize)
│ ├── base/ # Base Application manifests (upc-dev defaults)
│ │ ├── kustomization.yaml
│ │ ├── traefik-application.yaml
│ │ ├── keycloak.yaml
│ │ ├── grafana.yaml
│ │ ├── gitea.yaml
│ │ ├── gitea-actions.yaml
│ │ ├── tempo.yaml
│ │ ├── renovate.yaml
│ │ ├── ... # All other Application manifests
│ │ └── secrets.yaml
│ ├── overlays/ # Per-cluster overrides
│ │ ├── upc-dev/ # UpCloud Dev (uses base as-is)
│ │ └── upc-prod/ # UpCloud Prod (patches value paths)
│ ├── dashboards/ # Grafana dashboard ConfigMaps
│ └── values/ # Helm value overrides for infra
│ ├── base/ # Shared values (all clusters)
│ │ ├── traefik-values.yaml
│ │ ├── keycloak-values.yaml
│ │ ├── grafana-values.yaml
│ │ ├── prometheus-values.yaml
│ │ ├── gitea-values.yaml
│ │ └── ...
│ ├── upc-dev/ # upc-dev cluster-specific values
│ │ ├── traefik-values.yaml
│ │ ├── keycloak-values.yaml
│ │ └── grafana-values.yaml
│ └── upc-prod/ # upc-prod cluster-specific values
│ ├── traefik-values.yaml
│ ├── keycloak-values.yaml
│ └── grafana-values.yaml
├── apps/ # Business Application ArgoCD manifests (Kustomize)
│ ├── base/ # Base app manifests
│ │ ├── kustomization.yaml
│ │ ├── dot-ai-stack.yaml
│ │ └── ...
│ └── overlays/
│ ├── upc-dev/ # Uses base as-is
│ └── upc-prod/ # Patches value paths
├── cluster-resources/ # Cluster-wide Kubernetes resources
│ ├── ...
│ └── policies/ # Kyverno policies
├── secrets/ # Application secrets (sealed, per-cluster)
│ └── upc-dev/ # Secrets for upc-dev cluster
├── private/ # Local-only files (NOT in Git)
└── docs/ # Documentation
```
**Key Points**:
- `_app-of-apps-upc-dev.yaml` and `_app-of-apps-upc-prod.yaml` are the per-cluster root Applications
- Kustomize overlays in `infra/overlays/` render base Applications with per-cluster patches
- Helm values are split: `values/base/` (shared) + `values/upc-dev/` or `values/upc-prod/` (cluster-specific)
- `apps/` follows the same base/overlays pattern for business applications
- Changes pushed to this repo trigger automatic syncs in ArgoCD
- `private/` folder contains local-only files (Git-ignored)
---
### 2. **Helm Charts Repository**
**Repository**: `https://git.forteapps.net/Forte/forte-helm`
**Purpose**: Reusable Helm chart templates for Forte applications
**Location**: `C:\dev\k8s\forte-helm`
```
forte-helm/
└── forteapp/ # Generic Forte application chart
├── Chart.yaml # Chart metadata (v0.1.0)
├── values.yaml # Default values (base template)
├── templates/
│ ├── _helpers.tpl # Template helpers
│ ├── namespace.yaml
│ ├── deployment.yaml # Main app deployment
│ ├── service.yaml
│ ├── ingressroute.yaml # Traefik IngressRoute
│ ├── certificate.yaml # Cert-Manager Certificate
│ ├── configmap.yaml
│ ├── secret-auth-tokens.yaml
│ ├── hpa.yaml # Horizontal Pod Autoscaler
│ ├── database-statefulset.yaml # Optional PostgreSQL DB
│ └── database-service.yaml
└── README.md
```
**Key Points**:
- Single generic chart (`forteapp`) used by all Forte applications
- Supports optional PostgreSQL database (StatefulSet)
- Configurable authentication (token-based or OIDC)
- Traefik IngressRoute with automatic TLS via Cert-Manager
- Designed for microservices with similar patterns
---
### 3. **Helm Values Repository**
**Repository**: `git@github.com:fortedigital/helm-prod-values.git`
**Purpose**: Environment-specific configuration for each application
**Location**: `C:\dev\k8s\helm-prod-values`
```
helm-prod-values/
├── mcp10x/
│ └── values.yaml # MCP 10X configuration
├── musicman/
│ └── values.yaml # Music Man configuration
└── argocd-mcp/
└── values.yaml # ArgoCD MCP configuration
```
**Key Points**:
- Each app has its own folder with `values.yaml`
- Contains environment-specific settings (image tags, env vars, resources, etc.)
- Referenced by ArgoCD Applications using multi-source pattern
- Image tags are updated here by CI/CD pipelines
- Secrets are referenced by name (actual secrets stored as SealedSecrets)
**Example** (`mcp10x/values.yaml`):
```yaml
app:
image:
repository: ghcr.io/fortedigital/10x
tag: 2.0.4 # Updated by CI/CD
extraEnv:
- name: PORT
value: "3000"
envSecretName: "app-credentials" # References SealedSecret
ingress:
enabled: true
host: mcp10x.forteapps.net # Public domain
```
---
### 4. **Application Source Code Repositories**
**Purpose**: Application source code with CI/CD pipelines
**Examples**: Various private repositories
**Typical Structure**:
```
app-repository/
├── src/ # Application source code
├── Dockerfile # Container build definition
├── .github/
│ └── workflows/
│ └── build-and-deploy.yml # GitHub Actions workflow
└── package.json / requirements.txt # Dependencies
```
**CI/CD Workflow** (GitHub Actions):
1. Trigger on push to `main` branch
2. Build Docker image
3. Tag with version (e.g., `v2.0.4`)
4. Push to container registry (GHCR, Docker Hub, etc.)
5. Update image tag in `helm-prod-values` repository
6. ArgoCD detects change and syncs automatically
---
## GitOps Workflow
### The App-of-Apps Pattern
```
_app-of-apps-{upc-dev,upc-prod}.yaml (Root, per cluster)
├── infrastructure-apps (manages infra/)
│ ├── cluster-resources-application
│ ├── traefik-application
│ ├── cert-manager-application
│ ├── kyverno
│ ├── prometheus
│ ├── grafana
│ ├── tempo
│ └── ... (other infra apps)
└── enterprise-apps (manages apps/)
├── mcp10x
├── musicman
├── dot-ai-stack
└── argo-mcp
```
**How It Works**:
1. Bootstrap script installs ArgoCD and applies `_app-of-apps-upc-dev.yaml` (or `upc-prod`)
2. ArgoCD creates the root Application which monitors the appropriate `infra/overlays/` folder
3. Kustomize renders base Applications with cluster-specific patches
4. `enterprise-apps` Application monitors the cluster's `apps/overlays/` folder
5. ArgoCD continuously syncs (every 60s) and auto-heals drift
### Sync Waves & Ordering
Applications deploy in order using `argocd.argoproj.io/sync-wave` annotations:
```
Wave -1: Namespaces (created first)
Wave 0: Kyverno (policies ready before resources)
Wave 1: Cluster resources, infrastructure apps
Wave 2+: Business applications
```
Example:
```yaml
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1"
```
### Multi-Source Pattern
Applications like `mcp10x` and `musicman` use multiple sources:
```yaml
spec:
sources:
- repoURL: https://git.forteapps.net/Forte/forte-helm
path: forteapp # Helm chart templates
helm:
valueFiles:
- $values/mcp10x/values.yaml # Reference to second source
- repoURL: git@github.com:fortedigital/helm-prod-values.git
targetRevision: HEAD
ref: values # Named reference
```
**Benefits**:
- Chart templates separated from configuration
- Single chart reused across all apps
- Easy to update all apps by changing the chart
- Environment-specific values isolated in separate repo
### Multi-Cluster Pattern
Kustomize overlays enable deploying the same Applications across clusters with different configurations:
```yaml
# infra/base/ contains default (upc-dev) Applications
# Helm values are layered: base + cluster-specific
valueFiles:
- $values/infra/values/base/traefik-values.yaml # Shared config
- $values/infra/values/upc-dev/traefik-values.yaml # Cluster-specific
# infra/overlays/upc-prod/kustomization.yaml patches the second valueFile
patches:
- target:
kind: Application
name: traefik
patch: |
- op: replace
path: /spec/sources/0/helm/valueFiles/1
value: $values/infra/values/upc-prod/traefik-values.yaml
```
**Benefits**:
- Single source of truth for Application definitions
- Cluster-specific values isolated per overlay
- Easy to add new clusters by creating a new overlay
- Base values shared across all clusters reduce duplication
---
## CI/CD Pipeline
### Continuous Integration
**Application Repositories** contain GitHub Actions workflows:
```yaml
name: Build and Deploy
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: docker build -t ghcr.io/fortedigital/app:$VERSION .
- name: Push to registry
run: docker push ghcr.io/fortedigital/app:$VERSION
- name: Update Helm values
run: |
git clone git@github.com:fortedigital/helm-prod-values.git
cd helm-prod-values/app
sed -i "s/tag: .*/tag: $VERSION/" values.yaml
git commit -am "Update app to $VERSION"
git push
```
### Continuous Deployment
**ArgoCD** automatically syncs when changes are detected:
1. **Config Repo Change**:
- Developer updates `apps/myapp.yaml`
- Pushes to `launchpad` repo
- ArgoCD detects change (60s reconciliation)
- Syncs application to cluster
2. **Helm Values Change**:
- CI/CD updates `helm-prod-values/myapp/values.yaml`
- ArgoCD detects change
- Pulls new Helm chart with updated values
- Applies to cluster
3. **Sync Policy**:
```yaml
syncPolicy:
automated:
prune: true # Remove deleted resources
selfHeal: true # Revert manual changes
retry:
limit: 5 # Retry up to 5 times
backoff:
duration: 5s
maxDuration: 3m
```
### Deployment Validation
Before applying, ArgoCD:
- ✅ Validates YAML syntax
- ✅ Checks Kubernetes schema
- ✅ Runs server-side dry-run
- ✅ Verifies resource quotas
- ✅ Applies Kyverno policies
After applying:
- ✅ Waits for resources to become healthy
- ✅ Sends Slack notification (success/failure)
- ✅ Tracks sync status in UI
---
## Security Model
### Secret Management
**Sealed Secrets** encrypt secrets for safe Git storage:
```bash
# Developer creates plain secret locally
kubectl create secret generic app-creds \
--from-literal=API_KEY=secret123 \
--dry-run=client -o yaml > private/app-creds.yaml
# Seal the secret using kubeseal
kubeseal --format=yaml \
--cert=pub-cert.pem \
< private/app-creds.yaml \
> secrets/app-creds-sealed.yaml
# Commit sealed secret to Git
git add secrets/app-creds-sealed.yaml
git commit -m "Add app credentials"
```
**Storage**:
- ✅ Sealed secrets committed to Git
- ❌ Plain secrets kept in `private/` (Git-ignored) or discarded
- ⚠️ Secret rotation process not yet established
### Kyverno Policies
**Policy Engine** enforces security rules:
1. **Secret Cloning**: Automatically clones secrets to new namespaces
```yaml
# cluster-resources/policies/secret-cloner.yaml
# Secrets labeled "allowedToBeCloned: true" are synced
```
2. **Default Namespace Blocker**: Prevents use of `default` namespace
3. **Bare Pod Cleaner**: Removes pods without controllers (Deployments/StatefulSets)
4. **Deployment Verifier**: Ensures pods have proper controllers
5. **Auth Sidecar Injector**: Injects authentication proxy based on annotations
### Repository Access
**Private Repository Credentials** stored as SealedSecrets:
```yaml
# cluster-resources/forte10x-repo-credentials-sealed.yaml
```
ArgoCD uses these to access private Helm values repositories.
### Network Security
**Traefik Ingress** with TLS:
- All HTTP traffic redirects to HTTPS
- Let's Encrypt automatic certificate renewal
- Cert-Manager manages certificate lifecycle
- Per-application IngressRoutes with dedicated certificates
### Authentication
**Application-Level Auth** (optional):
- Token-based authentication (static tokens)
- OIDC integration (Keycloak, Okta, etc.)
- Auth sidecar injected via Kyverno policy
- Tokens stored in SealedSecrets
Example:
```yaml
# In deployment.yaml template
annotations:
policies.forteapps.io/auth: "true"
policies.forteapps.io/auth-token-secret-name: "app-tokens"
```
---
## Monitoring & Observability
### Stack Components
1. **Prometheus**: Metrics collection and storage
2. **Grafana**: Metrics visualization and dashboards
3. **Loki**: Log aggregation
4. **Tempo**: Distributed tracing (OTLP)
5. **Fluent-Bit**: Log shipping from pods to Loki
6. **Trivy**: Container vulnerability scanning
### Slack Notifications
All ArgoCD applications send notifications to shared Slack channel:
```yaml
metadata:
annotations:
notifications.argoproj.io/subscribe.on-sync-succeeded.slack: ""
notifications.argoproj.io/subscribe.on-sync-failed.slack: ""
notifications.argoproj.io/subscribe.on-degraded.slack: ""
```
Notifications include:
- ✅ Sync succeeded
- ❌ Sync failed
- ⚠️ Application degraded
---
## Disaster Recovery
### Cluster Rebuild
**Current State**: No backup routines exist yet. Cluster can be rebuilt from Git.
**Rebuild Process**:
1. Provision new Kubernetes cluster
2. Clone `launchpad` repository
3. Run `./bootstrap.sh`
4. ArgoCD installs and syncs all applications
5. Manually recreate unsealed secrets and seal them
**Data Loss**:
- Currently: Data loss is acceptable (internal use)
- Future: One stateful application may require backup strategy
### GitOps Advantages for DR
✅ **Infrastructure as Code**: Entire cluster defined in Git
✅ **Reproducible**: Cluster can be rebuilt identically
✅ **Auditable**: All changes tracked in Git history
✅ **Rollback**: Easy to revert to previous Git commit
✅ **Multi-Cluster**: Same config can deploy to multiple clusters
---
## Best Practices
### Repository Organization
✅ **DO**:
- Separate infrastructure (`infra/`) from applications (`apps/`)
- Use sync waves to control deployment order
- Keep secrets in `private/` folder (Git-ignored)
- Commit only sealed secrets to Git
- Use multi-source pattern for chart/values separation
❌ **DON'T**:
- Commit plain secrets to Git
- Mix infrastructure and application configs
- Hard-code environment-specific values in charts
- Manually modify resources in cluster (use Git)
### GitOps Workflow
✅ **DO**:
- All changes through Git (single source of truth)
- Use PR reviews for production changes
- Test changes in isolated namespaces first
- Monitor ArgoCD sync status
- Respond to Slack notifications
❌ **DON'T**:
- Use `kubectl apply` directly (breaks GitOps)
- Ignore sync failures
- Bypass ArgoCD for "quick fixes"
- Edit resources in place (`kubectl edit`)
### Application Development
✅ **DO**:
- Follow the `forteapp` chart pattern
- Use semantic versioning for image tags
- Update helm-prod-values via CI/CD
- Test locally with Docker Compose
- Document environment variables
❌ **DON'T**:
- Use `latest` image tag
- Hard-code configuration in code
- Skip local testing
- Deploy untested images to production
---
## Next Steps
📖 Continue to:
- **[Developer Guide](DEVELOPER-GUIDE.md)** - Learn how to deploy and manage applications
- **[Operations Runbook](OPERATIONS-RUNBOOK.md)** - Common operational tasks
- **[Technical Reference](REFERENCE.md)** - Detailed component documentation
---
**Last Updated**: 2026-03-16
**Maintained By**: Platform Team
**Questions?**: Contact #platform-support on Slack