711 lines
28 KiB
Markdown
711 lines
28 KiB
Markdown
# GitOps Architecture & Repository Guide
|
|
|
|
## Table of Contents
|
|
- [Overview](#overview)
|
|
- [Architecture Diagram](#architecture-diagram)
|
|
- [Repository Structure](#repository-structure)
|
|
- [GitOps Workflow](#gitops-workflow)
|
|
- [CI/CD Pipeline](#cicd-pipeline)
|
|
- [Security Model](#security-model)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where Git repositories serve as the single source of truth for both infrastructure and application deployments. The cluster setup is **cloud-agnostic**, with ready-to-use configurations for **UpCloud**, **AWS EKS**, **Azure AKS**, and **GCP GKE**.
|
|
|
|
### Key Characteristics
|
|
- **Environment**: Production (internal use only)
|
|
- **Cluster Type**: Multi-cloud, multi-cluster via Kustomize overlays (UpCloud, AWS, Azure, GCP)
|
|
- **GitOps Tool**: ArgoCD
|
|
- **Deployment Pattern**: App-of-Apps
|
|
- **Secret Management**: Sealed Secrets (kubeseal)
|
|
- **Ingress**: Traefik with Let's Encrypt TLS
|
|
- **Monitoring**: Prometheus + Grafana + Loki + Tempo + Fluent-Bit
|
|
- **Policy Engine**: Kyverno
|
|
- **Notifications**: Slack integration for sync status
|
|
|
|
---
|
|
|
|
## Architecture Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Developer Workflow │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Application Code │ │ Helm Charts │ │ Helm Values │
|
|
│ Repositories │──────│ Repository │──────│ Repository │
|
|
│ (Source Code) │ │ (Templates) │ │ (Config/Env) │
|
|
└─────────────────────┘ └──────────────────┘ └─────────────────┘
|
|
│ │ │
|
|
│ │ │
|
|
GitHub Actions │ │
|
|
Build & Push Image │ │
|
|
│ │ │
|
|
│ │ │
|
|
└────────► Update image tag ─┴──────────────────────────┘
|
|
in helm-prod-values │
|
|
│
|
|
▼
|
|
┌────────────────────────────────┐
|
|
│ Config Repository │
|
|
│ (ArgoCD Applications) │
|
|
│ git.forteapps.net/Forte/ │
|
|
│ launchpad │
|
|
└────────────────────────────────┘
|
|
│
|
|
│
|
|
ArgoCD monitors & syncs
|
|
│
|
|
▼
|
|
┌────────────────────────────────┐
|
|
│ Kubernetes Clusters │
|
|
│ (UpCloud, AWS, Azure, GCP) │
|
|
│ │
|
|
│ ┌──────────────────────────┐ │
|
|
│ │ ArgoCD │ │
|
|
│ │ (GitOps Controller) │ │
|
|
│ └──────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────┐ │
|
|
│ │ Infrastructure Layer │ │
|
|
│ │ - Traefik (Ingress) │ │
|
|
│ │ - Cert-Manager (TLS) │ │
|
|
│ │ - Kyverno (Policies) │ │
|
|
│ │ - Sealed Secrets │ │
|
|
│ └──────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────┐ │
|
|
│ │ Monitoring Stack │ │
|
|
│ │ - Prometheus │ │
|
|
│ │ - Grafana │ │
|
|
│ │ - Loki │ │
|
|
│ │ - Tempo │ │
|
|
│ │ - Fluent-Bit │ │
|
|
│ └──────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────┐ │
|
|
│ │ Application Layer │ │
|
|
│ │ - mcp10x │ │
|
|
│ │ - musicman │ │
|
|
│ │ - dot-ai-stack │ │
|
|
│ │ - argo-mcp │ │
|
|
│ └──────────────────────────┘ │
|
|
└────────────────────────────────┘
|
|
│
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Slack Channel │
|
|
│ (Notifications) │
|
|
└──────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Repository Structure
|
|
|
|
### 1. **Config Repository** (Current Repo)
|
|
**Repository**: `https://git.forteapps.net/Forte/launchpad`
|
|
**Purpose**: GitOps configuration - ArgoCD Applications and cluster resources
|
|
**Location**: `C:\dev\k8s\launchpad`
|
|
|
|
```
|
|
launchpad/
|
|
├── bootstrap.sh # Cluster initialization (ArgoCD + GitOps)
|
|
├── _app-of-apps-{cluster}.yaml # Root ArgoCD Application (per cluster)
|
|
│
|
|
├── .tofu/ # Infrastructure provisioning (OpenTofu)
|
|
│ ├── platforms/ # Per-platform IaC
|
|
│ │ ├── aks/ # Azure AKS
|
|
│ │ │ ├── modules/cluster/ # Reusable AKS module
|
|
│ │ │ ├── dev/ # tofu root for aks-dev
|
|
│ │ │ ├── prod/ # tofu root for aks-prod
|
|
│ │ │ └── workload/ # workload cluster (no data services)
|
|
│ │ ├── eks/ # AWS EKS (same structure)
|
|
│ │ ├── gke/ # GCP GKE
|
|
│ │ └── upc/ # UpCloud
|
|
│ ├── configs/ # Platform credentials (git-ignored)
|
|
│ │ └── {platform}.env.example # Template per platform
|
|
│ └── scripts/
|
|
│ ├── setup-cluster.sh # ./setup-cluster.sh <cluster> [--plan|--auto]
|
|
│ ├── teardown-cluster.sh # ./teardown-cluster.sh <cluster>
|
|
│ └── get-kubeconfig.sh # ./get-kubeconfig.sh <cluster>
|
|
│
|
|
├── clusters/ # Cluster metadata YAML (domain, IPs, etc.)
|
|
│ ├── aks-dev.yaml
|
|
│ ├── upc-dev.yaml
|
|
│ └── ...
|
|
│
|
|
├── infra/ # Infrastructure ArgoCD Applications (Kustomize)
|
|
│ ├── base/ # Base Application manifests (one dir per component)
|
|
│ │ ├── kustomization.yaml # Aggregates all component subdirectories
|
|
│ │ ├── traefik-application/
|
|
│ │ │ ├── kustomization.yaml
|
|
│ │ │ └── traefik-application.yaml
|
|
│ │ ├── keycloak/
|
|
│ │ │ ├── kustomization.yaml
|
|
│ │ │ └── keycloak.yaml
|
|
│ │ ├── grafana/
|
|
│ │ ├── prometheus/
|
|
│ │ ├── ... # Each component in its own subdirectory
|
|
│ │ └── secrets/
|
|
│ ├── overlays/ # Per-cluster Kustomize overrides
|
|
│ │ ├── upc-dev/ # UpCloud Dev — includes all (resources: ../../base)
|
|
│ │ ├── upc-prod/ # UpCloud Prod — all + patches
|
|
│ │ ├── aks-dev/ # Azure AKS Dev — selective components
|
|
│ │ ├── aks-prod/ # Azure AKS Prod
|
|
│ │ ├── eks-dev/ # AWS EKS Dev
|
|
│ │ ├── eks-prod/ # AWS EKS Prod
|
|
│ │ ├── gke-dev/ # GCP GKE Dev
|
|
│ │ └── gke-prod/ # GCP GKE Prod
|
|
│ ├── dashboards/ # Grafana dashboard ConfigMaps
|
|
│ └── values/ # Helm value overrides for infra
|
|
│ ├── base/ # Cloud-agnostic shared values
|
|
│ ├── upc-{dev,prod}/ # UpCloud: storage class, LB, pricing
|
|
│ ├── aws-{dev,prod}/ # AWS: gp3, NLB, CUR pricing
|
|
│ ├── aks-{dev,prod}/ # Azure: managed-csi-premium, Standard LB
|
|
│ └── gcp-{dev,prod}/ # GCP: premium-rwo, L4 LB
|
|
│
|
|
├── apps/ # Business Application ArgoCD manifests (Kustomize)
|
|
│ ├── base/ # One subdirectory per app
|
|
│ │ ├── kustomization.yaml
|
|
│ │ ├── musicman/
|
|
│ │ ├── mcp10x/
|
|
│ │ ├── dot-ai-stack/
|
|
│ │ ├── ts-mcp/
|
|
│ │ └── argo-mcp/
|
|
│ └── overlays/
|
|
│ ├── upc-dev/ # All apps (resources: ../../base)
|
|
│ ├── upc-prod/ # All apps + patches
|
|
│ └── aks-dev/ # Selective apps only
|
|
│
|
|
├── cluster-resources/ # Cluster-wide Kubernetes resources
|
|
│ ├── ...
|
|
│ └── policies/ # Kyverno policies
|
|
│
|
|
├── secrets/ # Application secrets (sealed, per-cluster)
|
|
│ └── upc-dev/ # Secrets for upc-dev cluster
|
|
│
|
|
├── private/ # Local-only files (NOT in Git)
|
|
│
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
**Key Points**:
|
|
- `_app-of-apps-upc-dev.yaml` and `_app-of-apps-upc-prod.yaml` are the per-cluster root Applications
|
|
- Each component in `base/` has its own subdirectory with a `kustomization.yaml`
|
|
- Overlays can include **all** components (`resources: [../../base]`) or **cherry-pick** specific ones (`resources: [../../base/grafana, ../../base/prometheus]`)
|
|
- Kustomize overlays in `infra/overlays/` render base Applications with per-cluster patches
|
|
- Helm values are split: `values/base/` (shared) + `values/upc-dev/` or `values/upc-prod/` (cluster-specific)
|
|
- `apps/` follows the same base/overlays pattern for business applications
|
|
- Changes pushed to this repo trigger automatic syncs in ArgoCD
|
|
- `private/` folder contains local-only files (Git-ignored)
|
|
|
|
---
|
|
|
|
### 2. **Helm Charts Repository**
|
|
**Repository**: `https://git.forteapps.net/Forte/forte-helm`
|
|
**Purpose**: Reusable Helm chart templates for Forte applications
|
|
**Location**: `C:\dev\k8s\forte-helm`
|
|
|
|
```
|
|
forte-helm/
|
|
└── forteapp/ # Generic Forte application chart
|
|
├── Chart.yaml # Chart metadata (v0.1.0)
|
|
├── values.yaml # Default values (base template)
|
|
├── templates/
|
|
│ ├── _helpers.tpl # Template helpers
|
|
│ ├── namespace.yaml
|
|
│ ├── deployment.yaml # Main app deployment
|
|
│ ├── service.yaml
|
|
│ ├── ingressroute.yaml # Traefik IngressRoute
|
|
│ ├── certificate.yaml # Cert-Manager Certificate
|
|
│ ├── configmap.yaml
|
|
│ ├── secret-auth-tokens.yaml
|
|
│ ├── hpa.yaml # Horizontal Pod Autoscaler
|
|
│ ├── database-statefulset.yaml # Optional PostgreSQL DB
|
|
│ └── database-service.yaml
|
|
└── README.md
|
|
```
|
|
|
|
**Key Points**:
|
|
- Single generic chart (`forteapp`) used by all Forte applications
|
|
- Supports optional PostgreSQL database (StatefulSet)
|
|
- Configurable authentication (token-based or OIDC)
|
|
- Traefik IngressRoute with automatic TLS via Cert-Manager
|
|
- Designed for microservices with similar patterns
|
|
|
|
---
|
|
|
|
### 3. **Helm Values Repository**
|
|
**Repository**: `git@github.com:fortedigital/helm-prod-values.git`
|
|
**Purpose**: Environment-specific configuration for each application
|
|
**Location**: `C:\dev\k8s\helm-prod-values`
|
|
|
|
```
|
|
helm-prod-values/
|
|
├── mcp10x/
|
|
│ └── values.yaml # MCP 10X configuration
|
|
├── musicman/
|
|
│ └── values.yaml # Music Man configuration
|
|
└── argocd-mcp/
|
|
└── values.yaml # ArgoCD MCP configuration
|
|
```
|
|
|
|
**Key Points**:
|
|
- Each app has its own folder with `values.yaml`
|
|
- Contains environment-specific settings (image tags, env vars, resources, etc.)
|
|
- Referenced by ArgoCD Applications using multi-source pattern
|
|
- Image tags are updated here by CI/CD pipelines
|
|
- Secrets are referenced by name (actual secrets stored as SealedSecrets)
|
|
|
|
**Example** (`mcp10x/values.yaml`):
|
|
```yaml
|
|
app:
|
|
image:
|
|
repository: ghcr.io/fortedigital/10x
|
|
tag: 2.0.4 # Updated by CI/CD
|
|
extraEnv:
|
|
- name: PORT
|
|
value: "3000"
|
|
envSecretName: "app-credentials" # References SealedSecret
|
|
|
|
ingress:
|
|
enabled: true
|
|
host: mcp10x.forteapps.net # Public domain
|
|
```
|
|
|
|
---
|
|
|
|
### 4. **Application Source Code Repositories**
|
|
**Purpose**: Application source code with CI/CD pipelines
|
|
**Examples**: Various private repositories
|
|
|
|
**Typical Structure**:
|
|
```
|
|
app-repository/
|
|
├── src/ # Application source code
|
|
├── Dockerfile # Container build definition
|
|
├── .github/
|
|
│ └── workflows/
|
|
│ └── build-and-deploy.yml # GitHub Actions workflow
|
|
└── package.json / requirements.txt # Dependencies
|
|
```
|
|
|
|
**CI/CD Workflow** (GitHub Actions):
|
|
1. Trigger on push to `main` branch
|
|
2. Build Docker image
|
|
3. Tag with version (e.g., `v2.0.4`)
|
|
4. Push to container registry (GHCR, Docker Hub, etc.)
|
|
5. Update image tag in `helm-prod-values` repository
|
|
6. ArgoCD detects change and syncs automatically
|
|
|
|
---
|
|
|
|
## GitOps Workflow
|
|
|
|
### The App-of-Apps Pattern
|
|
|
|
```
|
|
_app-of-apps-{cluster}.yaml (Root, per cluster — e.g. upc-dev, eks-prod, gke-dev)
|
|
│
|
|
├── infrastructure-apps (manages infra/)
|
|
│ ├── cluster-resources-application
|
|
│ ├── traefik-application
|
|
│ ├── cert-manager-application
|
|
│ ├── kyverno
|
|
│ ├── prometheus
|
|
│ ├── grafana
|
|
│ ├── tempo
|
|
│ └── ... (other infra apps)
|
|
│
|
|
└── enterprise-apps (manages apps/)
|
|
├── mcp10x
|
|
├── musicman
|
|
├── dot-ai-stack
|
|
└── argo-mcp
|
|
```
|
|
|
|
**How It Works**:
|
|
1. Bootstrap script installs ArgoCD and applies `_app-of-apps-upc-dev.yaml` (or `upc-prod`)
|
|
2. ArgoCD creates the root Application which monitors the appropriate `infra/overlays/` folder
|
|
3. Kustomize renders base Applications with cluster-specific patches
|
|
4. `enterprise-apps` Application monitors the cluster's `apps/overlays/` folder
|
|
5. ArgoCD continuously syncs (every 60s) and auto-heals drift
|
|
|
|
### Sync Waves & Ordering
|
|
|
|
Applications deploy in order using `argocd.argoproj.io/sync-wave` annotations:
|
|
|
|
```
|
|
Wave -1: Namespaces (created first)
|
|
Wave 0: Kyverno (policies ready before resources)
|
|
Wave 1: Cluster resources, infrastructure apps
|
|
Wave 2+: Business applications
|
|
```
|
|
|
|
Example:
|
|
```yaml
|
|
metadata:
|
|
annotations:
|
|
argocd.argoproj.io/sync-wave: "1"
|
|
```
|
|
|
|
### Multi-Source Pattern
|
|
|
|
Applications like `mcp10x` and `musicman` use multiple sources:
|
|
|
|
```yaml
|
|
spec:
|
|
sources:
|
|
- repoURL: https://git.forteapps.net/Forte/forte-helm
|
|
path: forteapp # Helm chart templates
|
|
helm:
|
|
valueFiles:
|
|
- $values/mcp10x/values.yaml # Reference to second source
|
|
|
|
- repoURL: git@github.com:fortedigital/helm-prod-values.git
|
|
targetRevision: HEAD
|
|
ref: values # Named reference
|
|
```
|
|
|
|
**Benefits**:
|
|
- Chart templates separated from configuration
|
|
- Single chart reused across all apps
|
|
- Easy to update all apps by changing the chart
|
|
- Environment-specific values isolated in separate repo
|
|
|
|
### Multi-Cluster Pattern
|
|
|
|
Kustomize overlays enable deploying the same Applications across clusters with different configurations.
|
|
|
|
Each component in `infra/base/` and `apps/base/` lives in its own subdirectory. Overlays define **which components to include** and optionally **patch** them:
|
|
|
|
```yaml
|
|
# Option 1: Include ALL components (full cluster)
|
|
# infra/overlays/upc-dev/kustomization.yaml
|
|
resources:
|
|
- ../../base # Pulls in every component subdirectory
|
|
|
|
# Option 2: Cherry-pick specific components (lightweight cluster)
|
|
# infra/overlays/aks-dev/kustomization.yaml
|
|
resources:
|
|
- ../../base/traefik-application
|
|
- ../../base/grafana
|
|
- ../../base/prometheus
|
|
- ../../base/loki
|
|
# Only listed components are deployed — others are excluded
|
|
```
|
|
|
|
Per-cluster patches swap Helm value file paths:
|
|
|
|
```yaml
|
|
# infra/overlays/upc-prod/kustomization.yaml
|
|
patches:
|
|
- target:
|
|
kind: Application
|
|
name: traefik
|
|
patch: |
|
|
- op: replace
|
|
path: /spec/sources/0/helm/valueFiles/1
|
|
value: $values/infra/values/upc-prod/traefik-values.yaml
|
|
```
|
|
|
|
Cloud-specific values (storage classes, load balancer annotations, cost model) are isolated in per-cluster value files. Base values are fully cloud-agnostic:
|
|
|
|
| Cloud | Storage Class | Load Balancer | OpenCost Provider |
|
|
|-------|--------------|---------------|-------------------|
|
|
| **UpCloud** | `upcloud-block-storage-maxiops` | UpCloud LB (ProxyProtocol v2) | Custom pricing |
|
|
| **AWS EKS** | `gp3` (EBS CSI) | NLB (ProxyProtocol v2) | AWS CUR |
|
|
| **Azure AKS** | `managed-csi-premium` | Standard LB (`externalTrafficPolicy: Local`) | Azure Billing API |
|
|
| **GCP GKE** | `premium-rwo` (PD CSI) | L4 passthrough NLB | GCP Cloud Billing |
|
|
|
|
**Benefits**:
|
|
- Single source of truth for Application definitions
|
|
- Cluster-specific values isolated per overlay
|
|
- Easy to add new clusters by creating a new overlay
|
|
- Base values shared across all clusters reduce duplication
|
|
|
|
---
|
|
|
|
## CI/CD Pipeline
|
|
|
|
### Continuous Integration
|
|
|
|
**Application Repositories** contain GitHub Actions workflows:
|
|
|
|
```yaml
|
|
name: Build and Deploy
|
|
|
|
on:
|
|
push:
|
|
branches: [ main ]
|
|
|
|
jobs:
|
|
build:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- name: Build Docker image
|
|
run: docker build -t ghcr.io/fortedigital/app:$VERSION .
|
|
|
|
- name: Push to registry
|
|
run: docker push ghcr.io/fortedigital/app:$VERSION
|
|
|
|
- name: Update Helm values
|
|
run: |
|
|
git clone git@github.com:fortedigital/helm-prod-values.git
|
|
cd helm-prod-values/app
|
|
sed -i "s/tag: .*/tag: $VERSION/" values.yaml
|
|
git commit -am "Update app to $VERSION"
|
|
git push
|
|
```
|
|
|
|
### Continuous Deployment
|
|
|
|
**ArgoCD** automatically syncs when changes are detected:
|
|
|
|
1. **Config Repo Change**:
|
|
- Developer updates `apps/myapp.yaml`
|
|
- Pushes to `launchpad` repo
|
|
- ArgoCD detects change (60s reconciliation)
|
|
- Syncs application to cluster
|
|
|
|
2. **Helm Values Change**:
|
|
- CI/CD updates `helm-prod-values/myapp/values.yaml`
|
|
- ArgoCD detects change
|
|
- Pulls new Helm chart with updated values
|
|
- Applies to cluster
|
|
|
|
3. **Sync Policy**:
|
|
```yaml
|
|
syncPolicy:
|
|
automated:
|
|
prune: true # Remove deleted resources
|
|
selfHeal: true # Revert manual changes
|
|
retry:
|
|
limit: 5 # Retry up to 5 times
|
|
backoff:
|
|
duration: 5s
|
|
maxDuration: 3m
|
|
```
|
|
|
|
### Deployment Validation
|
|
|
|
Before applying, ArgoCD:
|
|
- ✅ Validates YAML syntax
|
|
- ✅ Checks Kubernetes schema
|
|
- ✅ Runs server-side dry-run
|
|
- ✅ Verifies resource quotas
|
|
- ✅ Applies Kyverno policies
|
|
|
|
After applying:
|
|
- ✅ Waits for resources to become healthy
|
|
- ✅ Sends Slack notification (success/failure)
|
|
- ✅ Tracks sync status in UI
|
|
|
|
---
|
|
|
|
## Security Model
|
|
|
|
### Secret Management
|
|
|
|
**Sealed Secrets** encrypt secrets for safe Git storage:
|
|
|
|
```bash
|
|
# Developer creates plain secret locally
|
|
kubectl create secret generic app-creds \
|
|
--from-literal=API_KEY=secret123 \
|
|
--dry-run=client -o yaml > private/app-creds.yaml
|
|
|
|
# Seal the secret using kubeseal
|
|
kubeseal --format=yaml \
|
|
--cert=pub-cert.pem \
|
|
< private/app-creds.yaml \
|
|
> secrets/app-creds-sealed.yaml
|
|
|
|
# Commit sealed secret to Git
|
|
git add secrets/app-creds-sealed.yaml
|
|
git commit -m "Add app credentials"
|
|
```
|
|
|
|
**Storage**:
|
|
- ✅ Sealed secrets committed to Git
|
|
- ❌ Plain secrets kept in `private/` (Git-ignored) or discarded
|
|
- ⚠️ Secret rotation process not yet established
|
|
|
|
### Kyverno Policies
|
|
|
|
**Policy Engine** enforces security rules:
|
|
|
|
1. **Secret Cloning**: Automatically clones secrets to new namespaces
|
|
```yaml
|
|
# cluster-resources/policies/secret-cloner.yaml
|
|
# Secrets labeled "allowedToBeCloned: true" are synced
|
|
```
|
|
|
|
2. **Default Namespace Blocker**: Prevents use of `default` namespace
|
|
3. **Bare Pod Cleaner**: Removes pods without controllers (Deployments/StatefulSets)
|
|
4. **Deployment Verifier**: Ensures pods have proper controllers
|
|
5. **Auth Sidecar Injector**: Injects authentication proxy based on annotations
|
|
|
|
### Repository Access
|
|
|
|
**Private Repository Credentials** stored as SealedSecrets:
|
|
|
|
```yaml
|
|
# cluster-resources/forte10x-repo-credentials-sealed.yaml
|
|
```
|
|
|
|
ArgoCD uses these to access private Helm values repositories.
|
|
|
|
### Network Security
|
|
|
|
**Traefik Ingress** with TLS:
|
|
- All HTTP traffic redirects to HTTPS
|
|
- Let's Encrypt automatic certificate renewal
|
|
- Cert-Manager manages certificate lifecycle
|
|
- Per-application IngressRoutes with dedicated certificates
|
|
|
|
### Authentication
|
|
|
|
**Application-Level Auth** (optional):
|
|
- Token-based authentication (static tokens)
|
|
- OIDC integration (Keycloak, Okta, etc.)
|
|
- Auth sidecar injected via Kyverno policy
|
|
- Tokens stored in SealedSecrets
|
|
|
|
Example:
|
|
```yaml
|
|
# In deployment.yaml template
|
|
annotations:
|
|
policies.forteapps.io/auth: "true"
|
|
policies.forteapps.io/auth-token-secret-name: "app-tokens"
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring & Observability
|
|
|
|
### Stack Components
|
|
|
|
1. **Prometheus**: Metrics collection and storage
|
|
2. **Grafana**: Metrics visualization and dashboards
|
|
3. **Loki**: Log aggregation
|
|
4. **Tempo**: Distributed tracing (OTLP)
|
|
5. **Fluent-Bit**: Log shipping from pods to Loki
|
|
6. **Trivy**: Container vulnerability scanning
|
|
|
|
### Slack Notifications
|
|
|
|
All ArgoCD applications send notifications to shared Slack channel:
|
|
|
|
```yaml
|
|
metadata:
|
|
annotations:
|
|
notifications.argoproj.io/subscribe.on-sync-succeeded.slack: ""
|
|
notifications.argoproj.io/subscribe.on-sync-failed.slack: ""
|
|
notifications.argoproj.io/subscribe.on-degraded.slack: ""
|
|
```
|
|
|
|
Notifications include:
|
|
- ✅ Sync succeeded
|
|
- ❌ Sync failed
|
|
- ⚠️ Application degraded
|
|
|
|
---
|
|
|
|
## Disaster Recovery
|
|
|
|
### Cluster Rebuild
|
|
|
|
**Current State**: No backup routines exist yet. Cluster can be rebuilt from Git.
|
|
|
|
**Rebuild Process**:
|
|
1. Provision new Kubernetes cluster
|
|
2. Clone `launchpad` repository
|
|
3. Run `./bootstrap.sh`
|
|
4. ArgoCD installs and syncs all applications
|
|
5. Manually recreate unsealed secrets and seal them
|
|
|
|
**Data Loss**:
|
|
- Currently: Data loss is acceptable (internal use)
|
|
- Future: One stateful application may require backup strategy
|
|
|
|
### GitOps Advantages for DR
|
|
|
|
✅ **Infrastructure as Code**: Entire cluster defined in Git
|
|
✅ **Reproducible**: Cluster can be rebuilt identically
|
|
✅ **Auditable**: All changes tracked in Git history
|
|
✅ **Rollback**: Easy to revert to previous Git commit
|
|
✅ **Multi-Cluster**: Same config can deploy to multiple clusters
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### Repository Organization
|
|
|
|
✅ **DO**:
|
|
- Separate infrastructure (`infra/`) from applications (`apps/`)
|
|
- Use sync waves to control deployment order
|
|
- Keep secrets in `private/` folder (Git-ignored)
|
|
- Commit only sealed secrets to Git
|
|
- Use multi-source pattern for chart/values separation
|
|
|
|
❌ **DON'T**:
|
|
- Commit plain secrets to Git
|
|
- Mix infrastructure and application configs
|
|
- Hard-code environment-specific values in charts
|
|
- Manually modify resources in cluster (use Git)
|
|
|
|
### GitOps Workflow
|
|
|
|
✅ **DO**:
|
|
- All changes through Git (single source of truth)
|
|
- Use PR reviews for production changes
|
|
- Test changes in isolated namespaces first
|
|
- Monitor ArgoCD sync status
|
|
- Respond to Slack notifications
|
|
|
|
❌ **DON'T**:
|
|
- Use `kubectl apply` directly (breaks GitOps)
|
|
- Ignore sync failures
|
|
- Bypass ArgoCD for "quick fixes"
|
|
- Edit resources in place (`kubectl edit`)
|
|
|
|
### Application Development
|
|
|
|
✅ **DO**:
|
|
- Follow the `forteapp` chart pattern
|
|
- Use semantic versioning for image tags
|
|
- Update helm-prod-values via CI/CD
|
|
- Test locally with Docker Compose
|
|
- Document environment variables
|
|
|
|
❌ **DON'T**:
|
|
- Use `latest` image tag
|
|
- Hard-code configuration in code
|
|
- Skip local testing
|
|
- Deploy untested images to production
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
📖 Continue to:
|
|
- **[Developer Guide](DEVELOPER-GUIDE.md)** - Learn how to deploy and manage applications
|
|
- **[Operations Runbook](OPERATIONS-RUNBOOK.md)** - Common operational tasks
|
|
- **[Technical Reference](REFERENCE.md)** - Detailed component documentation
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-04-22
|
|
**Maintained By**: Platform Team
|
|
**Questions?**: Contact #platform-support on Slack
|