# GitOps Architecture & Repository Guide ## Table of Contents - [Overview](#overview) - [Architecture Diagram](#architecture-diagram) - [Repository Structure](#repository-structure) - [GitOps Workflow](#gitops-workflow) - [CI/CD Pipeline](#cicd-pipeline) - [Security Model](#security-model) --- ## Overview This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where Git repositories serve as the single source of truth for both infrastructure and application deployments. The cluster setup is **cloud-agnostic**, with ready-to-use configurations for **UpCloud**, **AWS EKS**, **Azure AKS**, and **GCP GKE**. ### Key Characteristics - **Environment**: Production (internal use only) - **Cluster Type**: Multi-cloud, multi-cluster via Kustomize overlays (UpCloud, AWS, Azure, GCP) - **GitOps Tool**: ArgoCD - **Deployment Pattern**: App-of-Apps - **Secret Management**: Sealed Secrets (kubeseal) - **Ingress**: Traefik with Let's Encrypt TLS - **Monitoring**: Prometheus + Grafana + Loki + Tempo + Fluent-Bit - **Policy Engine**: Kyverno - **Notifications**: Slack integration for sync status --- ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Developer Workflow │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Application Code │ │ Helm Charts │ │ Helm Values │ │ Repositories │──────│ Repository │──────│ Repository │ │ (Source Code) │ │ (Templates) │ │ (Config/Env) │ └─────────────────────┘ └──────────────────┘ └─────────────────┘ │ │ │ │ │ │ GitHub Actions │ │ Build & Push Image │ │ │ │ │ │ │ │ └────────► Update image tag ─┴──────────────────────────┘ in helm-prod-values │ │ ▼ ┌────────────────────────────────┐ │ Config Repository │ │ (ArgoCD Applications) │ │ git.forteapps.net/Forte/ │ │ launchpad │ └────────────────────────────────┘ │ │ ArgoCD monitors & syncs │ ▼ ┌────────────────────────────────┐ │ Kubernetes Clusters │ │ (UpCloud, AWS, Azure, GCP) │ │ │ │ ┌──────────────────────────┐ │ │ │ ArgoCD │ │ │ │ (GitOps Controller) │ │ │ └──────────────────────────┘ │ │ │ │ ┌──────────────────────────┐ │ │ │ Infrastructure Layer │ │ │ │ - Traefik (Ingress) │ │ │ │ - Cert-Manager (TLS) │ │ │ │ - Kyverno (Policies) │ │ │ │ - Sealed Secrets │ │ │ └──────────────────────────┘ │ │ │ │ ┌──────────────────────────┐ │ │ │ Monitoring Stack │ │ │ │ - Prometheus │ │ │ │ - Grafana │ │ │ │ - Loki │ │ │ │ - Tempo │ │ │ │ - Fluent-Bit │ │ │ └──────────────────────────┘ │ │ │ │ ┌──────────────────────────┐ │ │ │ Application Layer │ │ │ │ - mcp10x │ │ │ │ - musicman │ │ │ │ - dot-ai-stack │ │ │ │ - argo-mcp │ │ │ └──────────────────────────┘ │ └────────────────────────────────┘ │ │ ▼ ┌──────────────────┐ │ Slack Channel │ │ (Notifications) │ └──────────────────┘ ``` --- ## Repository Structure ### 1. **Config Repository** (Current Repo) **Repository**: `https://git.forteapps.net/Forte/launchpad` **Purpose**: GitOps configuration - ArgoCD Applications and cluster resources **Location**: `C:\dev\k8s\launchpad` ``` launchpad/ ├── bootstrap.sh # Cluster initialization script ├── _app-of-apps-upc-dev.yaml # Root ArgoCD Application (upc-dev cluster) ├── _app-of-apps-upc-prod.yaml # Root ArgoCD Application (upc-prod cluster) │ ├── infra/ # Infrastructure ArgoCD Applications (Kustomize) │ ├── base/ # Base Application manifests (upc-dev defaults) │ │ ├── kustomization.yaml │ │ ├── traefik-application.yaml │ │ ├── keycloak.yaml │ │ ├── grafana.yaml │ │ ├── gitea.yaml │ │ ├── gitea-actions.yaml │ │ ├── tempo.yaml │ │ ├── renovate.yaml │ │ ├── ... # All other Application manifests │ │ └── secrets.yaml │ ├── overlays/ # Per-cluster Kustomize overrides │ │ ├── upc-dev/ # UpCloud Dev (uses base as-is) │ │ ├── upc-prod/ # UpCloud Prod (patches value paths) │ │ ├── aws-dev/ # AWS EKS Dev │ │ ├── aws-prod/ # AWS EKS Prod │ │ ├── aks-dev/ # Azure AKS Dev │ │ ├── aks-prod/ # Azure AKS Prod │ │ ├── gcp-dev/ # GCP GKE Dev │ │ └── gcp-prod/ # GCP GKE Prod │ ├── dashboards/ # Grafana dashboard ConfigMaps │ └── values/ # Helm value overrides for infra │ ├── base/ # Cloud-agnostic shared values │ ├── upc-{dev,prod}/ # UpCloud: storage class, LB, pricing │ ├── aws-{dev,prod}/ # AWS: gp3, NLB, CUR pricing │ ├── aks-{dev,prod}/ # Azure: managed-csi-premium, Standard LB │ └── gcp-{dev,prod}/ # GCP: premium-rwo, L4 LB │ ├── apps/ # Business Application ArgoCD manifests (Kustomize) │ ├── base/ # Base app manifests │ │ ├── kustomization.yaml │ │ ├── dot-ai-stack.yaml │ │ └── ... │ └── overlays/ │ ├── upc-dev/ # Uses base as-is │ └── upc-prod/ # Patches value paths │ ├── cluster-resources/ # Cluster-wide Kubernetes resources │ ├── ... │ └── policies/ # Kyverno policies │ ├── secrets/ # Application secrets (sealed, per-cluster) │ └── upc-dev/ # Secrets for upc-dev cluster │ ├── private/ # Local-only files (NOT in Git) │ └── docs/ # Documentation ``` **Key Points**: - `_app-of-apps-upc-dev.yaml` and `_app-of-apps-upc-prod.yaml` are the per-cluster root Applications - Kustomize overlays in `infra/overlays/` render base Applications with per-cluster patches - Helm values are split: `values/base/` (shared) + `values/upc-dev/` or `values/upc-prod/` (cluster-specific) - `apps/` follows the same base/overlays pattern for business applications - Changes pushed to this repo trigger automatic syncs in ArgoCD - `private/` folder contains local-only files (Git-ignored) --- ### 2. **Helm Charts Repository** **Repository**: `https://git.forteapps.net/Forte/forte-helm` **Purpose**: Reusable Helm chart templates for Forte applications **Location**: `C:\dev\k8s\forte-helm` ``` forte-helm/ └── forteapp/ # Generic Forte application chart ├── Chart.yaml # Chart metadata (v0.1.0) ├── values.yaml # Default values (base template) ├── templates/ │ ├── _helpers.tpl # Template helpers │ ├── namespace.yaml │ ├── deployment.yaml # Main app deployment │ ├── service.yaml │ ├── ingressroute.yaml # Traefik IngressRoute │ ├── certificate.yaml # Cert-Manager Certificate │ ├── configmap.yaml │ ├── secret-auth-tokens.yaml │ ├── hpa.yaml # Horizontal Pod Autoscaler │ ├── database-statefulset.yaml # Optional PostgreSQL DB │ └── database-service.yaml └── README.md ``` **Key Points**: - Single generic chart (`forteapp`) used by all Forte applications - Supports optional PostgreSQL database (StatefulSet) - Configurable authentication (token-based or OIDC) - Traefik IngressRoute with automatic TLS via Cert-Manager - Designed for microservices with similar patterns --- ### 3. **Helm Values Repository** **Repository**: `git@github.com:fortedigital/helm-prod-values.git` **Purpose**: Environment-specific configuration for each application **Location**: `C:\dev\k8s\helm-prod-values` ``` helm-prod-values/ ├── mcp10x/ │ └── values.yaml # MCP 10X configuration ├── musicman/ │ └── values.yaml # Music Man configuration └── argocd-mcp/ └── values.yaml # ArgoCD MCP configuration ``` **Key Points**: - Each app has its own folder with `values.yaml` - Contains environment-specific settings (image tags, env vars, resources, etc.) - Referenced by ArgoCD Applications using multi-source pattern - Image tags are updated here by CI/CD pipelines - Secrets are referenced by name (actual secrets stored as SealedSecrets) **Example** (`mcp10x/values.yaml`): ```yaml app: image: repository: ghcr.io/fortedigital/10x tag: 2.0.4 # Updated by CI/CD extraEnv: - name: PORT value: "3000" envSecretName: "app-credentials" # References SealedSecret ingress: enabled: true host: mcp10x.forteapps.net # Public domain ``` --- ### 4. **Application Source Code Repositories** **Purpose**: Application source code with CI/CD pipelines **Examples**: Various private repositories **Typical Structure**: ``` app-repository/ ├── src/ # Application source code ├── Dockerfile # Container build definition ├── .github/ │ └── workflows/ │ └── build-and-deploy.yml # GitHub Actions workflow └── package.json / requirements.txt # Dependencies ``` **CI/CD Workflow** (GitHub Actions): 1. Trigger on push to `main` branch 2. Build Docker image 3. Tag with version (e.g., `v2.0.4`) 4. Push to container registry (GHCR, Docker Hub, etc.) 5. Update image tag in `helm-prod-values` repository 6. ArgoCD detects change and syncs automatically --- ## GitOps Workflow ### The App-of-Apps Pattern ``` _app-of-apps-{cluster}.yaml (Root, per cluster — e.g. upc-dev, aws-prod, gcp-dev) │ ├── infrastructure-apps (manages infra/) │ ├── cluster-resources-application │ ├── traefik-application │ ├── cert-manager-application │ ├── kyverno │ ├── prometheus │ ├── grafana │ ├── tempo │ └── ... (other infra apps) │ └── enterprise-apps (manages apps/) ├── mcp10x ├── musicman ├── dot-ai-stack └── argo-mcp ``` **How It Works**: 1. Bootstrap script installs ArgoCD and applies `_app-of-apps-upc-dev.yaml` (or `upc-prod`) 2. ArgoCD creates the root Application which monitors the appropriate `infra/overlays/` folder 3. Kustomize renders base Applications with cluster-specific patches 4. `enterprise-apps` Application monitors the cluster's `apps/overlays/` folder 5. ArgoCD continuously syncs (every 60s) and auto-heals drift ### Sync Waves & Ordering Applications deploy in order using `argocd.argoproj.io/sync-wave` annotations: ``` Wave -1: Namespaces (created first) Wave 0: Kyverno (policies ready before resources) Wave 1: Cluster resources, infrastructure apps Wave 2+: Business applications ``` Example: ```yaml metadata: annotations: argocd.argoproj.io/sync-wave: "1" ``` ### Multi-Source Pattern Applications like `mcp10x` and `musicman` use multiple sources: ```yaml spec: sources: - repoURL: https://git.forteapps.net/Forte/forte-helm path: forteapp # Helm chart templates helm: valueFiles: - $values/mcp10x/values.yaml # Reference to second source - repoURL: git@github.com:fortedigital/helm-prod-values.git targetRevision: HEAD ref: values # Named reference ``` **Benefits**: - Chart templates separated from configuration - Single chart reused across all apps - Easy to update all apps by changing the chart - Environment-specific values isolated in separate repo ### Multi-Cluster Pattern Kustomize overlays enable deploying the same Applications across clusters with different configurations: ```yaml # infra/base/ contains default (upc-dev) Applications # Helm values are layered: base + cluster-specific valueFiles: - $values/infra/values/base/traefik-values.yaml # Shared config - $values/infra/values/upc-dev/traefik-values.yaml # Cluster-specific # infra/overlays/upc-prod/kustomization.yaml patches the second valueFile patches: - target: kind: Application name: traefik patch: | - op: replace path: /spec/sources/0/helm/valueFiles/1 value: $values/infra/values/upc-prod/traefik-values.yaml ``` Cloud-specific values (storage classes, load balancer annotations, cost model) are isolated in per-cluster value files. Base values are fully cloud-agnostic: | Cloud | Storage Class | Load Balancer | OpenCost Provider | |-------|--------------|---------------|-------------------| | **UpCloud** | `upcloud-block-storage-maxiops` | UpCloud LB (ProxyProtocol v2) | Custom pricing | | **AWS EKS** | `gp3` (EBS CSI) | NLB (ProxyProtocol v2) | AWS CUR | | **Azure AKS** | `managed-csi-premium` | Standard LB (`externalTrafficPolicy: Local`) | Azure Billing API | | **GCP GKE** | `premium-rwo` (PD CSI) | L4 passthrough NLB | GCP Cloud Billing | **Benefits**: - Single source of truth for Application definitions - Cluster-specific values isolated per overlay - Easy to add new clusters by creating a new overlay - Base values shared across all clusters reduce duplication --- ## CI/CD Pipeline ### Continuous Integration **Application Repositories** contain GitHub Actions workflows: ```yaml name: Build and Deploy on: push: branches: [ main ] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build Docker image run: docker build -t ghcr.io/fortedigital/app:$VERSION . - name: Push to registry run: docker push ghcr.io/fortedigital/app:$VERSION - name: Update Helm values run: | git clone git@github.com:fortedigital/helm-prod-values.git cd helm-prod-values/app sed -i "s/tag: .*/tag: $VERSION/" values.yaml git commit -am "Update app to $VERSION" git push ``` ### Continuous Deployment **ArgoCD** automatically syncs when changes are detected: 1. **Config Repo Change**: - Developer updates `apps/myapp.yaml` - Pushes to `launchpad` repo - ArgoCD detects change (60s reconciliation) - Syncs application to cluster 2. **Helm Values Change**: - CI/CD updates `helm-prod-values/myapp/values.yaml` - ArgoCD detects change - Pulls new Helm chart with updated values - Applies to cluster 3. **Sync Policy**: ```yaml syncPolicy: automated: prune: true # Remove deleted resources selfHeal: true # Revert manual changes retry: limit: 5 # Retry up to 5 times backoff: duration: 5s maxDuration: 3m ``` ### Deployment Validation Before applying, ArgoCD: - ✅ Validates YAML syntax - ✅ Checks Kubernetes schema - ✅ Runs server-side dry-run - ✅ Verifies resource quotas - ✅ Applies Kyverno policies After applying: - ✅ Waits for resources to become healthy - ✅ Sends Slack notification (success/failure) - ✅ Tracks sync status in UI --- ## Security Model ### Secret Management **Sealed Secrets** encrypt secrets for safe Git storage: ```bash # Developer creates plain secret locally kubectl create secret generic app-creds \ --from-literal=API_KEY=secret123 \ --dry-run=client -o yaml > private/app-creds.yaml # Seal the secret using kubeseal kubeseal --format=yaml \ --cert=pub-cert.pem \ < private/app-creds.yaml \ > secrets/app-creds-sealed.yaml # Commit sealed secret to Git git add secrets/app-creds-sealed.yaml git commit -m "Add app credentials" ``` **Storage**: - ✅ Sealed secrets committed to Git - ❌ Plain secrets kept in `private/` (Git-ignored) or discarded - ⚠️ Secret rotation process not yet established ### Kyverno Policies **Policy Engine** enforces security rules: 1. **Secret Cloning**: Automatically clones secrets to new namespaces ```yaml # cluster-resources/policies/secret-cloner.yaml # Secrets labeled "allowedToBeCloned: true" are synced ``` 2. **Default Namespace Blocker**: Prevents use of `default` namespace 3. **Bare Pod Cleaner**: Removes pods without controllers (Deployments/StatefulSets) 4. **Deployment Verifier**: Ensures pods have proper controllers 5. **Auth Sidecar Injector**: Injects authentication proxy based on annotations ### Repository Access **Private Repository Credentials** stored as SealedSecrets: ```yaml # cluster-resources/forte10x-repo-credentials-sealed.yaml ``` ArgoCD uses these to access private Helm values repositories. ### Network Security **Traefik Ingress** with TLS: - All HTTP traffic redirects to HTTPS - Let's Encrypt automatic certificate renewal - Cert-Manager manages certificate lifecycle - Per-application IngressRoutes with dedicated certificates ### Authentication **Application-Level Auth** (optional): - Token-based authentication (static tokens) - OIDC integration (Keycloak, Okta, etc.) - Auth sidecar injected via Kyverno policy - Tokens stored in SealedSecrets Example: ```yaml # In deployment.yaml template annotations: policies.forteapps.io/auth: "true" policies.forteapps.io/auth-token-secret-name: "app-tokens" ``` --- ## Monitoring & Observability ### Stack Components 1. **Prometheus**: Metrics collection and storage 2. **Grafana**: Metrics visualization and dashboards 3. **Loki**: Log aggregation 4. **Tempo**: Distributed tracing (OTLP) 5. **Fluent-Bit**: Log shipping from pods to Loki 6. **Trivy**: Container vulnerability scanning ### Slack Notifications All ArgoCD applications send notifications to shared Slack channel: ```yaml metadata: annotations: notifications.argoproj.io/subscribe.on-sync-succeeded.slack: "" notifications.argoproj.io/subscribe.on-sync-failed.slack: "" notifications.argoproj.io/subscribe.on-degraded.slack: "" ``` Notifications include: - ✅ Sync succeeded - ❌ Sync failed - ⚠️ Application degraded --- ## Disaster Recovery ### Cluster Rebuild **Current State**: No backup routines exist yet. Cluster can be rebuilt from Git. **Rebuild Process**: 1. Provision new Kubernetes cluster 2. Clone `launchpad` repository 3. Run `./bootstrap.sh` 4. ArgoCD installs and syncs all applications 5. Manually recreate unsealed secrets and seal them **Data Loss**: - Currently: Data loss is acceptable (internal use) - Future: One stateful application may require backup strategy ### GitOps Advantages for DR ✅ **Infrastructure as Code**: Entire cluster defined in Git ✅ **Reproducible**: Cluster can be rebuilt identically ✅ **Auditable**: All changes tracked in Git history ✅ **Rollback**: Easy to revert to previous Git commit ✅ **Multi-Cluster**: Same config can deploy to multiple clusters --- ## Best Practices ### Repository Organization ✅ **DO**: - Separate infrastructure (`infra/`) from applications (`apps/`) - Use sync waves to control deployment order - Keep secrets in `private/` folder (Git-ignored) - Commit only sealed secrets to Git - Use multi-source pattern for chart/values separation ❌ **DON'T**: - Commit plain secrets to Git - Mix infrastructure and application configs - Hard-code environment-specific values in charts - Manually modify resources in cluster (use Git) ### GitOps Workflow ✅ **DO**: - All changes through Git (single source of truth) - Use PR reviews for production changes - Test changes in isolated namespaces first - Monitor ArgoCD sync status - Respond to Slack notifications ❌ **DON'T**: - Use `kubectl apply` directly (breaks GitOps) - Ignore sync failures - Bypass ArgoCD for "quick fixes" - Edit resources in place (`kubectl edit`) ### Application Development ✅ **DO**: - Follow the `forteapp` chart pattern - Use semantic versioning for image tags - Update helm-prod-values via CI/CD - Test locally with Docker Compose - Document environment variables ❌ **DON'T**: - Use `latest` image tag - Hard-code configuration in code - Skip local testing - Deploy untested images to production --- ## Next Steps 📖 Continue to: - **[Developer Guide](DEVELOPER-GUIDE.md)** - Learn how to deploy and manage applications - **[Operations Runbook](OPERATIONS-RUNBOOK.md)** - Common operational tasks - **[Technical Reference](REFERENCE.md)** - Detailed component documentation --- **Last Updated**: 2026-04-22 **Maintained By**: Platform Team **Questions?**: Contact #platform-support on Slack