docs
This commit is contained in:
532
README.md
532
README.md
@@ -1,149 +1,463 @@
|
|||||||
## Overview
|
# Kubernetes Cluster - GitOps Configuration
|
||||||
|
|
||||||
This is a **Kubernetes cluster bootstrapping and GitOps configuration repository** using ArgoCD. It defines the infrastructure-as-code for deploying and managing applications, services, and policies on Kubernetes clusters.
|
> **Kubernetes cluster bootstrapping and GitOps configuration repository** using ArgoCD for UpCloud Managed Kubernetes
|
||||||
|
|
||||||
## Repository Structure
|
[](https://argoproj.github.io/cd/)
|
||||||
|
[](https://upcloud.com/)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Complete Documentation
|
||||||
|
|
||||||
|
**New developers and operators**: Please refer to our comprehensive documentation for detailed guides and references:
|
||||||
|
|
||||||
|
### 🎯 [**START HERE: Documentation Index**](docs/README.md)
|
||||||
|
|
||||||
|
| Document | Description | Audience |
|
||||||
|
|----------|-------------|----------|
|
||||||
|
| **[GitOps Architecture](docs/GITOPS-ARCHITECTURE.md)** | System architecture, repository structure, GitOps workflows, security model | Everyone (start here) |
|
||||||
|
| **[Developer Guide](docs/DEVELOPER-GUIDE.md)** | Local setup, deploying apps, managing secrets, troubleshooting | Developers |
|
||||||
|
| **[Operations Runbook](docs/OPERATIONS-RUNBOOK.md)** | Cluster bootstrap, day-to-day operations, incident response, maintenance | Platform Engineers, SREs |
|
||||||
|
| **[Technical Reference](docs/REFERENCE.md)** | Component specs, Helm charts, ArgoCD config, Kyverno policies, API docs | Everyone (reference) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
### For New Developers
|
||||||
|
```bash
|
||||||
|
# 1. Clone repositories
|
||||||
|
git clone https://github.com/snothub/sturdy-adventure.git
|
||||||
|
git clone git@github.com:fortedigital/helm-values.git
|
||||||
|
|
||||||
|
# 2. Read the guides
|
||||||
|
# - Start: docs/GITOPS-ARCHITECTURE.md
|
||||||
|
# - Follow: docs/DEVELOPER-GUIDE.md
|
||||||
|
|
||||||
|
# 3. Deploy your first app (see Developer Guide)
|
||||||
|
```
|
||||||
|
|
||||||
|
### For Operators
|
||||||
|
```bash
|
||||||
|
# 1. Bootstrap new cluster
|
||||||
|
./bootstrap.sh
|
||||||
|
|
||||||
|
# 2. Verify deployment
|
||||||
|
kubectl get applications -n argocd
|
||||||
|
kubectl get pods --all-namespaces
|
||||||
|
|
||||||
|
# 3. Read Operations Runbook for day-to-day tasks
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Overview
|
||||||
|
|
||||||
|
This repository contains the complete GitOps configuration for our Kubernetes cluster, using the **App-of-Apps pattern** with ArgoCD.
|
||||||
|
|
||||||
|
### What's Inside
|
||||||
|
|
||||||
|
- **Infrastructure Applications**: Traefik, Cert-Manager, Kyverno, Prometheus, Grafana, Loki, Sealed Secrets
|
||||||
|
- **Business Applications**: MCP10X, MusicMan, Dot-AI Stack, ArgoCD MCP
|
||||||
|
- **Policies**: Kyverno security policies for secret management, namespace controls, pod verification
|
||||||
|
- **Monitoring**: Full observability stack with metrics, logs, and alerting
|
||||||
|
- **Secrets**: Sealed Secrets for secure Git storage
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
|
||||||
|
✅ **GitOps-Native**: Git is the single source of truth
|
||||||
|
✅ **Auto-Sync**: Changes automatically deployed (60s reconciliation)
|
||||||
|
✅ **Self-Healing**: Manual cluster changes are reverted
|
||||||
|
✅ **Multi-Source**: Separate chart templates from configuration
|
||||||
|
✅ **Policy Enforcement**: Kyverno ensures security and compliance
|
||||||
|
✅ **TLS Everywhere**: Automatic Let's Encrypt certificates
|
||||||
|
✅ **Full Observability**: Prometheus, Grafana, Loki integration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🗂️ Repository Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
.
|
.
|
||||||
├── bootstrap.sh # Main bootstrap script to initialize ArgoCD and cluster
|
├── bootstrap.sh # Cluster initialization script
|
||||||
├── _app-of-apps.yaml # App-of-apps pattern: main ArgoCD Application that manages all other apps
|
├── _app-of-apps.yaml # Root ArgoCD Application (App-of-Apps pattern)
|
||||||
├── apps/ # Business application resources
|
│
|
||||||
│ ├── feedback-hub.yaml # Feedback Hub test app
|
├── infra/ # Infrastructure ArgoCD Applications
|
||||||
│ ├── musicman.yaml # Music Man hackathon app
|
│ ├── enterprise-apps.yaml # Manages all apps in apps/ folder
|
||||||
│ └── dot-ai-stack.yaml # dot-ai AI assistant stack
|
│ ├── traefik-application.yaml
|
||||||
├── infra/ # Individual ArgoCD Application resources for infrastructure
|
│ ├── cert-manager-application.yaml
|
||||||
│ ├── enterprise-apps.yaml # Enterprise apps: parent Application that syncs everything in "apps" folder
|
│ ├── kyverno.yaml
|
||||||
│ ├── traefik-application.yaml # Ingress controller (Traefik)
|
│ ├── prometheus.yaml
|
||||||
│ ├── cert-manager-application.yaml # TLS certificate management
|
│ ├── grafana.yaml
|
||||||
│ ├── kyverno.yaml # Policy engine for security
|
│ ├── loki.yaml
|
||||||
│ ├── kyverno-policies.yaml # Kyverno policy definitions
|
│ ├── fluent-bit.yaml
|
||||||
│ ├── prometheus.yaml # Metrics & monitoring
|
│ ├── trivy.yaml
|
||||||
│ ├── grafana.yaml # Monitoring visualization
|
│ ├── sealedsecrets.yaml
|
||||||
│ ├── loki.yaml # Log aggregation
|
│ └── values/ # Helm value overrides
|
||||||
│ ├── fluent-bit.yaml # Log shipping
|
│
|
||||||
│ ├── trivy.yaml # Container scanning
|
├── apps/ # Business Applications
|
||||||
│ ├── sealedsecrets.yaml # Secret encryption
|
│ ├── mcp10x.yaml
|
||||||
│ ├── cluster-resources-application.yaml # Cluster-wide resources
|
│ ├── musicman.yaml
|
||||||
│ └── values/ # Helm value overrides for ArgoCD and services
|
│ ├── dot-ai-stack.yaml
|
||||||
│ ├── argocd-values.yaml # ArgoCD server configuration
|
│ └── argo-mcp.yaml
|
||||||
│ ├── prometheus-values.yaml
|
│
|
||||||
│ ├── grafana-values.yaml
|
├── cluster-resources/ # Cluster-wide Kubernetes resources
|
||||||
│ ├── loki-values.yaml
|
│ ├── letsencrypt-issuer.yaml
|
||||||
│ └── fluent-bit-values.yaml
|
│ ├── kyverno-config.yaml
|
||||||
└── cluster-resources/ # Cluster-level configurations managed by cluster-resources-application.yaml
|
│ ├── *-sealed.yaml # Sealed secrets
|
||||||
├── cert-manager-namespace.yaml
|
│ └── policies/ # Kyverno policies
|
||||||
├── secrets-namespace.yaml # Namespace for secrets
|
│ ├── secret-cloner.yaml
|
||||||
├── letsencrypt-issuer.yaml # TLS certificate issuer
|
│ ├── default-ns-blocker.yaml
|
||||||
├── kyverno-config.yaml # Security policies and secret syncing
|
│ ├── bare-pod-cleaner.yaml
|
||||||
├── argocd-notifications-secret-sealed.yaml # Sealed secret for ArgoCD notifications
|
│ └── auth-sidecar-injector.yaml
|
||||||
└── policies/ # Kyverno policy definitions
|
│
|
||||||
├── deployment-verifier.yaml # Policy to verify pods have controllers
|
├── secrets/ # Application secrets (sealed)
|
||||||
├── label-checker.yaml # Policy to check labels
|
│ └── *-credentials-sealed.yaml
|
||||||
├── bare-pod-cleaner.yaml # Policy to clean up pods without controllers
|
│
|
||||||
├── replicaset-cleaner.yaml # Policy to clean up orphaned replica sets
|
├── private/ # Local-only files (Git-ignored)
|
||||||
├── default-ns-blocker.yaml # Policy to block use of default namespace
|
│ └── *.yaml # Unsealed secrets (never committed)
|
||||||
└── secret-cloner.yaml # Policy to clone secrets across namespaces
|
│
|
||||||
|
└── docs/ # 📚 Comprehensive documentation
|
||||||
|
├── README.md # Documentation index
|
||||||
|
├── GITOPS-ARCHITECTURE.md # Architecture guide
|
||||||
|
├── DEVELOPER-GUIDE.md # Developer onboarding
|
||||||
|
├── OPERATIONS-RUNBOOK.md # Operations procedures
|
||||||
|
└── REFERENCE.md # Technical reference
|
||||||
```
|
```
|
||||||
|
|
||||||
## Architecture & Key Concepts
|
**See [GitOps Architecture - Repository Structure](docs/GITOPS-ARCHITECTURE.md#repository-structure) for detailed explanation.**
|
||||||
|
|
||||||
### GitOps Model
|
---
|
||||||
- **App-of-Apps Pattern**: `_app-of-apps.yaml` is the root Application that manages all infrastructure applications
|
|
||||||
- **App-of-Apps Pattern**: `infra/enterprise-apps.yaml` is the main Application that manages all custom applications
|
|
||||||
- **Source of Truth**: GitHub repository (`https://github.com/snothub/sturdy-adventure.git`) is the single source of truth
|
|
||||||
- **Auto-sync**: All Applications have automated sync enabled with auto-pruning and self-healing
|
|
||||||
- **Namespace Creation**: `CreateNamespace=true` allows ArgoCD to create namespaces as needed
|
|
||||||
|
|
||||||
### Key Components
|
## 🏗️ Architecture
|
||||||
|
|
||||||
1. **Traefik** - Kubernetes Ingress controller for routing external traffic with HTTP/HTTPS redirect
|
### Three-Repository Pattern
|
||||||
2. **Cert-Manager** - Automates TLS certificate management with Let's Encrypt (see `letsencrypt-issuer.yaml`)
|
|
||||||
3. **Kyverno** - Policy engine that enforces security rules and syncs secrets across namespaces (via `sync-secret-with-multi-clone` policy)
|
|
||||||
4. **Monitoring Stack** - Prometheus (metrics) + Grafana (visualization) + Loki (logs) + Fluent-Bit (log shipping)
|
|
||||||
5. **Trivy** - Container vulnerability scanning
|
|
||||||
6. **Sealed Secrets** - Encrypts secrets for safe storage in Git
|
|
||||||
|
|
||||||
### Secret Management
|
| Repository | Purpose | You Edit |
|
||||||
- **Kyverno ClusterPolicy**: Automatically clones secrets from the `secrets` namespace to new namespaces when they're created
|
|------------|---------|----------|
|
||||||
- Only secrets labeled `allowedToBeCloned: "true"` are cloned
|
| **[sturdy-adventure](https://github.com/snothub/sturdy-adventure.git)** (this repo) | ArgoCD Applications, cluster resources | ✅ Often |
|
||||||
- Syncing happens automatically via `synchronize: true` in the policy
|
| **[forte-helm](https://github.com/snothub/forte-helm)** | Generic Helm chart templates | ❌ Rarely |
|
||||||
|
| **[helm-values](git@github.com:fortedigital/helm-values.git)** | App-specific configuration & versions | ✅ Sometimes |
|
||||||
|
|
||||||
### Network Configuration
|
### GitOps Workflow
|
||||||
- ArgoCD UI: `argocd.127.0.0.1.nip.io` (local development)
|
|
||||||
- Server runs in insecure mode (`--insecure`, `--disable-auth`) - suitable for local/dev clusters
|
|
||||||
- Traefik routes to multiple services via Kubernetes Ingress
|
|
||||||
|
|
||||||
## Common Commands
|
```
|
||||||
|
Developer commits code → CI/CD builds image → Updates helm-values → ArgoCD syncs → Deployed to cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
**Learn more**: [GitOps Architecture - GitOps Workflow](docs/GITOPS-ARCHITECTURE.md#gitops-workflow)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 Common Tasks
|
||||||
|
|
||||||
|
### Deploy a New Application
|
||||||
|
|
||||||
|
**See detailed guide**: [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application)
|
||||||
|
|
||||||
|
**Quick version**:
|
||||||
|
1. Create `apps/myapp.yaml` (ArgoCD Application manifest)
|
||||||
|
2. Create `helm-values/myapp/values.yaml` (configuration)
|
||||||
|
3. Create sealed secrets if needed
|
||||||
|
4. Commit and push - ArgoCD auto-syncs!
|
||||||
|
|
||||||
|
### Update an Existing Application
|
||||||
|
|
||||||
|
**See detailed guide**: [Developer Guide - Updating an Existing Application](docs/DEVELOPER-GUIDE.md#updating-an-existing-application)
|
||||||
|
|
||||||
|
**Quick version**:
|
||||||
|
- **Update code**: Push to app repo → CI/CD updates image tag in helm-values
|
||||||
|
- **Update config**: Edit `helm-values/myapp/values.yaml` → commit → push
|
||||||
|
|
||||||
|
### Manage Secrets
|
||||||
|
|
||||||
|
**See detailed guide**: [Developer Guide - Working with Secrets](docs/DEVELOPER-GUIDE.md#working-with-secrets)
|
||||||
|
|
||||||
### Bootstrap the Cluster
|
|
||||||
```bash
|
```bash
|
||||||
|
# Create plain secret
|
||||||
|
kubectl create secret generic myapp-creds \
|
||||||
|
--from-literal=KEY=value \
|
||||||
|
--dry-run=client -o yaml > private/myapp-creds.yaml
|
||||||
|
|
||||||
|
# Seal it
|
||||||
|
kubeseal --format=yaml --cert=pub-cert.pem \
|
||||||
|
< private/myapp-creds.yaml > secrets/myapp-creds-sealed.yaml
|
||||||
|
|
||||||
|
# Commit sealed version
|
||||||
|
git add secrets/myapp-creds-sealed.yaml
|
||||||
|
git commit -m "Add myapp credentials"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bootstrap Cluster
|
||||||
|
|
||||||
|
**See detailed guide**: [Operations Runbook - Cluster Bootstrap](docs/OPERATIONS-RUNBOOK.md#cluster-bootstrap)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Initialize new cluster
|
||||||
./bootstrap.sh
|
./bootstrap.sh
|
||||||
```
|
|
||||||
This runs the `Bootstrap()` function which calls `ArgoCd()` to install ArgoCD using Helm.
|
|
||||||
|
|
||||||
### Monitor ArgoCD Applications
|
# Verify
|
||||||
|
kubectl get applications -n argocd
|
||||||
|
kubectl get pods --all-namespaces
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🛠️ Quick Reference
|
||||||
|
|
||||||
|
### Monitor Applications
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# View all ArgoCD applications
|
# List all ArgoCD applications
|
||||||
kubectl get applications -n argocd
|
kubectl get applications -n argocd
|
||||||
|
|
||||||
# Watch sync status
|
# Watch sync status
|
||||||
kubectl get applications -n argocd -w
|
kubectl get applications -n argocd -w
|
||||||
|
|
||||||
# Describe a specific application
|
# Check specific application
|
||||||
kubectl describe app <app-name> -n argocd
|
kubectl describe application myapp -n argocd
|
||||||
|
|
||||||
|
# View application logs
|
||||||
|
kubectl logs -n myapp <pod-name>
|
||||||
```
|
```
|
||||||
|
|
||||||
### Manage ArgoCD
|
### Access UIs
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Port forward to access UI
|
# ArgoCD UI
|
||||||
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
||||||
|
# Access: https://localhost:8080 (no auth required)
|
||||||
|
|
||||||
# Access at: https://localhost:8080 (admin auth disabled in dev)
|
# Grafana
|
||||||
|
kubectl port-forward -n monitoring svc/grafana 3000:80
|
||||||
|
# Access: http://localhost:3000
|
||||||
|
|
||||||
|
# Prometheus
|
||||||
|
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
|
||||||
|
# Access: http://localhost:9090
|
||||||
```
|
```
|
||||||
|
|
||||||
### Check Secret Syncing
|
### Troubleshooting
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Verify Kyverno policy is applied
|
# Check pod status
|
||||||
kubectl get clusterpolicy sync-secret-with-multi-clone
|
kubectl get pods -n myapp
|
||||||
|
|
||||||
# Check if secrets are synced to a namespace
|
# View pod logs
|
||||||
kubectl get secrets -n <namespace>
|
kubectl logs -n myapp <pod-name>
|
||||||
|
|
||||||
|
# Check pod events
|
||||||
|
kubectl describe pod -n myapp <pod-name>
|
||||||
|
|
||||||
|
# Check ArgoCD sync errors
|
||||||
|
kubectl describe application myapp -n argocd
|
||||||
|
|
||||||
|
# Force sync
|
||||||
|
kubectl patch application myapp -n argocd \
|
||||||
|
--type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'
|
||||||
```
|
```
|
||||||
|
|
||||||
### Deploy Changes
|
**Full troubleshooting guide**: [Developer Guide - Troubleshooting](docs/DEVELOPER-GUIDE.md#troubleshooting)
|
||||||
- Changes to YAML files in `apps/`, `infra/`, `**/values/`, or `cluster-resources/` are automatically synced by ArgoCD
|
|
||||||
- Push changes to the GitHub repository for them to be reflected
|
|
||||||
- ArgoCD reconciliation happens every 60s (`timeout.reconciliation: 60s`)
|
|
||||||
- Each application has a 5-minute sync timeout to prevent stalled deployments
|
|
||||||
|
|
||||||
### Review Helm Values
|
---
|
||||||
Application-specific Helm value overrides are in `**/values/` and referenced within each Application's Helm configuration. Each application manifest uses both external value files and inline overrides where needed.
|
|
||||||
|
|
||||||
### Application Organization & Sync Ordering
|
## 🔐 Security
|
||||||
- Infrastructure applications use `argocd.argoproj.io/sync-wave` annotations for ordered deployment
|
|
||||||
- Kyverno (sync-wave: 0) deploys before cluster-resources (sync-wave: 1) to ensure policies are ready
|
|
||||||
- All applications have resource requests and limits configured to prevent resource starvation
|
|
||||||
- Applications are labeled with `app.kubernetes.io/part-of` to indicate their component type (platform, monitoring-stack, application)
|
|
||||||
|
|
||||||
## Important Notes
|
### Secret Management
|
||||||
|
- ✅ Sealed Secrets for Git storage
|
||||||
|
- ✅ Kyverno auto-clones secrets to namespaces
|
||||||
|
- ❌ Never commit plain secrets
|
||||||
|
|
||||||
- **No admin auth in development**: ArgoCD has `admin.enabled: "false"` - suitable for local/dev only
|
### Network Security
|
||||||
- **Insecure server mode**: `--insecure` and `--disable-auth` flags are set - not for production
|
- ✅ All traffic TLS-encrypted (Let's Encrypt)
|
||||||
- **Folder organization**:
|
- ✅ HTTP → HTTPS redirect
|
||||||
- `infra/` contains infrastructure/platform components (Traefik, Cert-Manager, Prometheus, Grafana, Loki, etc.)
|
- ✅ Traefik IngressRoute per application
|
||||||
- `apps/` is reserved for business applications (currently empty)
|
|
||||||
- **Replica counts**: Traefik runs 2 replicas; other services run 1 replica
|
|
||||||
- **Retry policy**: All applications retry up to 5 times with exponential backoff (max 3m timeout per application)
|
|
||||||
- **Ignore replica scaling**: Deployments ignore replica count differences to allow HPA/manual scaling
|
|
||||||
- **Sync validation**: All applications validate manifests before applying (`Validate=true`)
|
|
||||||
- **Server-side apply**: All applications use `ServerSideApply=true` for safer field ownership tracking
|
|
||||||
|
|
||||||
## Development Tips
|
### Policy Enforcement
|
||||||
|
- ✅ Kyverno policies for security
|
||||||
|
- ✅ Default namespace blocked
|
||||||
|
- ✅ Bare pods not allowed
|
||||||
|
- ✅ Optional authentication sidecar injection
|
||||||
|
|
||||||
- **Check ArgoCD logs**: `kubectl logs -n argocd deployment/argocd-application-controller`
|
**Learn more**: [GitOps Architecture - Security Model](docs/GITOPS-ARCHITECTURE.md#security-model)
|
||||||
- **Validate YAML**: Files are validated server-side (`Validate=true`) before applying
|
|
||||||
- **Resource tracking**: Uses annotation-based method (`application.resourceTrackingMethod: annotation`)
|
---
|
||||||
- **Modify applications**: Edit the corresponding YAML in `infra/` and push to trigger sync
|
|
||||||
- **Add new services**: Create a new Application YAML in `apps/` following the pattern of existing ones, then it will be auto-discovered by the app-of-apps
|
## 📊 Infrastructure Components
|
||||||
- **Application folder naming**: Infrastructure components are in `infra/`; `apps/` is reserved for business applications
|
|
||||||
|
| Component | Purpose | Namespace | Replicas |
|
||||||
|
|-----------|---------|-----------|----------|
|
||||||
|
| **ArgoCD** | GitOps controller | `argocd` | 1 |
|
||||||
|
| **Traefik** | Ingress controller | `traefik` | 2 |
|
||||||
|
| **Cert-Manager** | TLS certificates | `cert-manager` | 1 |
|
||||||
|
| **Kyverno** | Policy engine | `kyverno` | 1 |
|
||||||
|
| **Sealed Secrets** | Secret encryption | `kube-system` | 1 |
|
||||||
|
| **Prometheus** | Metrics | `monitoring` | 1 |
|
||||||
|
| **Grafana** | Dashboards | `monitoring` | 1 |
|
||||||
|
| **Loki** | Logs | `monitoring` | 1 |
|
||||||
|
| **Fluent-Bit** | Log shipping | `monitoring` | DaemonSet |
|
||||||
|
| **Trivy** | Vulnerability scanning | `trivy-system` | 1 |
|
||||||
|
|
||||||
|
**Full specs**: [Technical Reference - Infrastructure Components](docs/REFERENCE.md#infrastructure-components)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🌐 Domains & Networking
|
||||||
|
|
||||||
|
- **Local development**: `*.127.0.0.1.nip.io`
|
||||||
|
- **Production**: `*.forteapps.net`
|
||||||
|
- **DNS**: Manual configuration (contact platform team)
|
||||||
|
- **TLS**: Automatic via Let's Encrypt
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📖 Key Concepts
|
||||||
|
|
||||||
|
### App-of-Apps Pattern
|
||||||
|
`_app-of-apps.yaml` is the root Application that manages all other Applications in `infra/`. Each YAML in `infra/` becomes a child Application managed by ArgoCD.
|
||||||
|
|
||||||
|
### Multi-Source Pattern
|
||||||
|
Applications reference both:
|
||||||
|
1. **Helm charts** from `forte-helm` (templates)
|
||||||
|
2. **Values** from `helm-values` (configuration)
|
||||||
|
|
||||||
|
This separates reusable templates from environment-specific config.
|
||||||
|
|
||||||
|
### Sync Waves
|
||||||
|
Applications deploy in order using `argocd.argoproj.io/sync-wave`:
|
||||||
|
- Wave `-1`: Namespaces
|
||||||
|
- Wave `0`: Kyverno (policies)
|
||||||
|
- Wave `1`: Infrastructure
|
||||||
|
- Wave `2+`: Applications
|
||||||
|
|
||||||
|
### Auto-Sync & Self-Heal
|
||||||
|
- **Auto-Sync**: ArgoCD automatically deploys Git changes (60s polling)
|
||||||
|
- **Self-Heal**: Manual cluster changes are reverted to match Git
|
||||||
|
- **Prune**: Deleted resources in Git are removed from cluster
|
||||||
|
|
||||||
|
**Learn more**: [GitOps Architecture - GitOps Workflow](docs/GITOPS-ARCHITECTURE.md#gitops-workflow)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Configuration
|
||||||
|
|
||||||
|
### ArgoCD Settings
|
||||||
|
- **Reconciliation**: Every 60 seconds
|
||||||
|
- **Sync timeout**: 5 minutes per application
|
||||||
|
- **Retry policy**: 5 attempts with exponential backoff
|
||||||
|
- **Authentication**: Disabled (internal use only)
|
||||||
|
|
||||||
|
### Application Defaults
|
||||||
|
- **Auto-sync**: Enabled
|
||||||
|
- **Self-heal**: Enabled
|
||||||
|
- **Prune**: Enabled
|
||||||
|
- **Validation**: Server-side validation enabled
|
||||||
|
- **Server-side apply**: Enabled
|
||||||
|
|
||||||
|
**Full configuration**: [Technical Reference - ArgoCD Configuration](docs/REFERENCE.md#argocd-configuration)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🆘 Getting Help
|
||||||
|
|
||||||
|
### Documentation
|
||||||
|
1. **Start here**: [Documentation Index](docs/README.md)
|
||||||
|
2. **For development**: [Developer Guide](docs/DEVELOPER-GUIDE.md)
|
||||||
|
3. **For operations**: [Operations Runbook](docs/OPERATIONS-RUNBOOK.md)
|
||||||
|
4. **For reference**: [Technical Reference](docs/REFERENCE.md)
|
||||||
|
|
||||||
|
### Support
|
||||||
|
- **Slack**: #platform-support
|
||||||
|
- **Issues**: Contact platform team
|
||||||
|
- **Emergencies**: Escalate via Slack
|
||||||
|
|
||||||
|
### Common Questions
|
||||||
|
|
||||||
|
| Question | Answer |
|
||||||
|
|----------|--------|
|
||||||
|
| How do I deploy an app? | [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application) |
|
||||||
|
| How do I manage secrets? | [Developer Guide - Working with Secrets](docs/DEVELOPER-GUIDE.md#working-with-secrets) |
|
||||||
|
| App won't sync? | [Developer Guide - Troubleshooting](docs/DEVELOPER-GUIDE.md#troubleshooting) |
|
||||||
|
| How do I bootstrap a cluster? | [Operations Runbook - Cluster Bootstrap](docs/OPERATIONS-RUNBOOK.md#cluster-bootstrap) |
|
||||||
|
| Where are the logs? | [Operations Runbook - Monitoring & Alerting](docs/OPERATIONS-RUNBOOK.md#monitoring--alerting) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🤝 Contributing
|
||||||
|
|
||||||
|
### Adding a New Application
|
||||||
|
1. Read [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application)
|
||||||
|
2. Create ArgoCD Application manifest in `apps/`
|
||||||
|
3. Create Helm values in `helm-values/`
|
||||||
|
4. Create sealed secrets if needed
|
||||||
|
5. Commit and push - ArgoCD handles the rest!
|
||||||
|
|
||||||
|
### Modifying Infrastructure
|
||||||
|
1. Read [Operations Runbook](docs/OPERATIONS-RUNBOOK.md)
|
||||||
|
2. Update relevant files in `infra/` or `cluster-resources/`
|
||||||
|
3. Test changes in isolated namespace if possible
|
||||||
|
4. Commit and push
|
||||||
|
5. Monitor sync status in Slack/ArgoCD UI
|
||||||
|
|
||||||
|
### Updating Documentation
|
||||||
|
Documentation lives in `docs/`. To update:
|
||||||
|
1. Edit relevant markdown file
|
||||||
|
2. Update "Last Updated" date
|
||||||
|
3. Submit PR or push directly
|
||||||
|
4. Notify team of significant changes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Notes
|
||||||
|
|
||||||
|
### Current Environment
|
||||||
|
- **Provider**: UpCloud Managed Kubernetes
|
||||||
|
- **Environment**: Production (internal use only)
|
||||||
|
- **Cluster**: Single cluster
|
||||||
|
- **Auth**: Disabled for ArgoCD (internal access)
|
||||||
|
- **Backup**: None (cluster rebuildable via GitOps)
|
||||||
|
|
||||||
|
### Known Limitations
|
||||||
|
- No automated backups (yet)
|
||||||
|
- Secret rotation not automated
|
||||||
|
- Single cluster (no multi-cluster setup)
|
||||||
|
- DNS management is manual
|
||||||
|
|
||||||
|
**Future improvements**: See [Operations Runbook - Disaster Recovery](docs/OPERATIONS-RUNBOOK.md#disaster-recovery)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Additional Resources
|
||||||
|
|
||||||
|
### External Documentation
|
||||||
|
- [ArgoCD Documentation](https://argo-cd.readthedocs.io/)
|
||||||
|
- [Kyverno Documentation](https://kyverno.io/docs/)
|
||||||
|
- [Traefik Documentation](https://doc.traefik.io/traefik/)
|
||||||
|
- [Cert-Manager Documentation](https://cert-manager.io/docs/)
|
||||||
|
- [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets)
|
||||||
|
|
||||||
|
### Related Repositories
|
||||||
|
- [forte-helm](https://github.com/snothub/forte-helm) - Helm chart templates
|
||||||
|
- [helm-values](git@github.com:fortedigital/helm-values.git) - Application values
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📄 License
|
||||||
|
|
||||||
|
Internal use only. Not for public distribution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 👥 Maintainers
|
||||||
|
|
||||||
|
**Platform Team**
|
||||||
|
- Contact: #platform-support on Slack
|
||||||
|
- Issues: Create issue in repository or contact team directly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated**: 2026-03-16
|
||||||
|
**Documentation Version**: 1.0.0
|
||||||
|
|
||||||
|
**🚀 Ready to get started? Check out the [Documentation Index](docs/README.md)!**
|
||||||
|
|||||||
1089
docs/DEVELOPER-GUIDE.md
Normal file
1089
docs/DEVELOPER-GUIDE.md
Normal file
File diff suppressed because it is too large
Load Diff
640
docs/GITOPS-ARCHITECTURE.md
Normal file
640
docs/GITOPS-ARCHITECTURE.md
Normal file
@@ -0,0 +1,640 @@
|
|||||||
|
# GitOps Architecture & Repository Guide
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Architecture Diagram](#architecture-diagram)
|
||||||
|
- [Repository Structure](#repository-structure)
|
||||||
|
- [GitOps Workflow](#gitops-workflow)
|
||||||
|
- [CI/CD Pipeline](#cicd-pipeline)
|
||||||
|
- [Security Model](#security-model)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where Git repositories serve as the single source of truth for both infrastructure and application deployments. The cluster is running on **UpCloud Managed Kubernetes** but is designed to be cloud-agnostic.
|
||||||
|
|
||||||
|
### Key Characteristics
|
||||||
|
- **Environment**: Production (internal use only)
|
||||||
|
- **Cluster Type**: Single cluster, single environment
|
||||||
|
- **GitOps Tool**: ArgoCD
|
||||||
|
- **Deployment Pattern**: App-of-Apps
|
||||||
|
- **Secret Management**: Sealed Secrets (kubeseal)
|
||||||
|
- **Ingress**: Traefik with Let's Encrypt TLS
|
||||||
|
- **Monitoring**: Prometheus + Grafana + Loki + Fluent-Bit
|
||||||
|
- **Policy Engine**: Kyverno
|
||||||
|
- **Notifications**: Slack integration for sync status
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Developer Workflow │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||||
|
│ Application Code │ │ Helm Charts │ │ Helm Values │
|
||||||
|
│ Repositories │──────│ Repository │──────│ Repository │
|
||||||
|
│ (Source Code) │ │ (Templates) │ │ (Config/Env) │
|
||||||
|
└─────────────────────┘ └──────────────────┘ └─────────────────┘
|
||||||
|
│ │ │
|
||||||
|
│ │ │
|
||||||
|
GitHub Actions │ │
|
||||||
|
Build & Push Image │ │
|
||||||
|
│ │ │
|
||||||
|
│ │ │
|
||||||
|
└────────► Update image tag ─┴──────────────────────────┘
|
||||||
|
in helm-values │
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌────────────────────────────────┐
|
||||||
|
│ Config Repository │
|
||||||
|
│ (ArgoCD Applications) │
|
||||||
|
│ github.com/snothub/ │
|
||||||
|
│ sturdy-adventure │
|
||||||
|
└────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│
|
||||||
|
ArgoCD monitors & syncs
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌────────────────────────────────┐
|
||||||
|
│ Kubernetes Cluster │
|
||||||
|
│ (UpCloud Managed) │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────┐ │
|
||||||
|
│ │ ArgoCD │ │
|
||||||
|
│ │ (GitOps Controller) │ │
|
||||||
|
│ └──────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────┐ │
|
||||||
|
│ │ Infrastructure Layer │ │
|
||||||
|
│ │ - Traefik (Ingress) │ │
|
||||||
|
│ │ - Cert-Manager (TLS) │ │
|
||||||
|
│ │ - Kyverno (Policies) │ │
|
||||||
|
│ │ - Sealed Secrets │ │
|
||||||
|
│ └──────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────┐ │
|
||||||
|
│ │ Monitoring Stack │ │
|
||||||
|
│ │ - Prometheus │ │
|
||||||
|
│ │ - Grafana │ │
|
||||||
|
│ │ - Loki │ │
|
||||||
|
│ │ - Fluent-Bit │ │
|
||||||
|
│ └──────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────────────────────┐ │
|
||||||
|
│ │ Application Layer │ │
|
||||||
|
│ │ - mcp10x │ │
|
||||||
|
│ │ - musicman │ │
|
||||||
|
│ │ - dot-ai-stack │ │
|
||||||
|
│ │ - argo-mcp │ │
|
||||||
|
│ └──────────────────────────┘ │
|
||||||
|
└────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────────────┐
|
||||||
|
│ Slack Channel │
|
||||||
|
│ (Notifications) │
|
||||||
|
└──────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Repository Structure
|
||||||
|
|
||||||
|
### 1. **Config Repository** (Current Repo)
|
||||||
|
**Repository**: `https://github.com/snothub/sturdy-adventure.git`
|
||||||
|
**Purpose**: GitOps configuration - ArgoCD Applications and cluster resources
|
||||||
|
**Location**: `C:\dev\k8s\launchpad`
|
||||||
|
|
||||||
|
```
|
||||||
|
sturdy-adventure/
|
||||||
|
├── bootstrap.sh # Cluster initialization script
|
||||||
|
├── _app-of-apps.yaml # Root ArgoCD Application (App-of-Apps pattern)
|
||||||
|
│
|
||||||
|
├── infra/ # Infrastructure ArgoCD Applications
|
||||||
|
│ ├── enterprise-apps.yaml # Parent app managing all apps in apps/
|
||||||
|
│ ├── cluster-resources-application.yaml
|
||||||
|
│ ├── traefik-application.yaml
|
||||||
|
│ ├── cert-manager-application.yaml
|
||||||
|
│ ├── kyverno.yaml
|
||||||
|
│ ├── kyverno-policies.yaml
|
||||||
|
│ ├── prometheus.yaml
|
||||||
|
│ ├── grafana.yaml
|
||||||
|
│ ├── loki.yaml
|
||||||
|
│ ├── fluent-bit.yaml
|
||||||
|
│ ├── trivy.yaml
|
||||||
|
│ ├── sealedsecrets.yaml
|
||||||
|
│ ├── secrets.yaml
|
||||||
|
│ └── values/ # Helm value overrides for infra
|
||||||
|
│ ├── argocd-values.yaml
|
||||||
|
│ ├── prometheus-values.yaml
|
||||||
|
│ ├── grafana-values.yaml
|
||||||
|
│ ├── loki-values.yaml
|
||||||
|
│ └── fluent-bit-values.yaml
|
||||||
|
│
|
||||||
|
├── apps/ # Business Application ArgoCD manifests
|
||||||
|
│ ├── mcp10x.yaml # MCP 10X application
|
||||||
|
│ ├── musicman.yaml # Music Man application
|
||||||
|
│ ├── dot-ai-stack.yaml # Dot AI Stack
|
||||||
|
│ └── argo-mcp.yaml # ArgoCD MCP server
|
||||||
|
│
|
||||||
|
├── cluster-resources/ # Cluster-wide Kubernetes resources
|
||||||
|
│ ├── cert-manager-namespace.yaml
|
||||||
|
│ ├── secrets-namespace.yaml
|
||||||
|
│ ├── letsencrypt-issuer.yaml # Let's Encrypt ClusterIssuer
|
||||||
|
│ ├── kyverno-config.yaml
|
||||||
|
│ ├── argocd-notifications-secret-sealed.yaml
|
||||||
|
│ ├── snothub-repo-credentials-sealed.yaml
|
||||||
|
│ ├── forte10x-repo-credentials-sealed.yaml
|
||||||
|
│ ├── mcp10x-repo-credentials-sealed.yaml
|
||||||
|
│ └── policies/ # Kyverno policies
|
||||||
|
│ ├── deployment-verifier.yaml
|
||||||
|
│ ├── label-checker.yaml
|
||||||
|
│ ├── bare-pod-cleaner.yaml
|
||||||
|
│ ├── replicaset-cleaner.yaml
|
||||||
|
│ ├── default-ns-blocker.yaml
|
||||||
|
│ ├── secret-cloner.yaml
|
||||||
|
│ └── auth-sidecar-injector.yaml
|
||||||
|
│
|
||||||
|
├── secrets/ # Application secrets (sealed)
|
||||||
|
│ ├── argocd-mcp-credentials.yaml
|
||||||
|
│ ├── dot-ai-secrets.yaml
|
||||||
|
│ ├── mcp10x-credentials-sealed.yaml
|
||||||
|
│ └── musicman-credentials.yaml
|
||||||
|
│
|
||||||
|
├── private/ # Local-only files (NOT in Git)
|
||||||
|
│ ├── *.yaml # Unsealed secrets
|
||||||
|
│ └── *.sh # Helper scripts
|
||||||
|
│
|
||||||
|
└── docs/ # Documentation
|
||||||
|
├── GITOPS-ARCHITECTURE.md # This file
|
||||||
|
├── DEVELOPER-GUIDE.md
|
||||||
|
├── OPERATIONS-RUNBOOK.md
|
||||||
|
└── REFERENCE.md
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points**:
|
||||||
|
- `_app-of-apps.yaml` is the root Application that ArgoCD monitors
|
||||||
|
- `infra/enterprise-apps.yaml` auto-discovers all apps in `apps/` folder
|
||||||
|
- Changes pushed to this repo trigger automatic syncs in ArgoCD
|
||||||
|
- `private/` folder contains local-only files (Git-ignored)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. **Helm Charts Repository**
|
||||||
|
**Repository**: `https://github.com/snothub/forte-helm`
|
||||||
|
**Purpose**: Reusable Helm chart templates for Forte applications
|
||||||
|
**Location**: `C:\dev\k8s\forte-helm`
|
||||||
|
|
||||||
|
```
|
||||||
|
forte-helm/
|
||||||
|
└── forteapp/ # Generic Forte application chart
|
||||||
|
├── Chart.yaml # Chart metadata (v0.1.0)
|
||||||
|
├── values.yaml # Default values (base template)
|
||||||
|
├── templates/
|
||||||
|
│ ├── _helpers.tpl # Template helpers
|
||||||
|
│ ├── namespace.yaml
|
||||||
|
│ ├── deployment.yaml # Main app deployment
|
||||||
|
│ ├── service.yaml
|
||||||
|
│ ├── ingressroute.yaml # Traefik IngressRoute
|
||||||
|
│ ├── certificate.yaml # Cert-Manager Certificate
|
||||||
|
│ ├── configmap.yaml
|
||||||
|
│ ├── secret-auth-tokens.yaml
|
||||||
|
│ ├── hpa.yaml # Horizontal Pod Autoscaler
|
||||||
|
│ ├── database-statefulset.yaml # Optional PostgreSQL DB
|
||||||
|
│ └── database-service.yaml
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points**:
|
||||||
|
- Single generic chart (`forteapp`) used by all Forte applications
|
||||||
|
- Supports optional PostgreSQL database (StatefulSet)
|
||||||
|
- Configurable authentication (token-based or OIDC)
|
||||||
|
- Traefik IngressRoute with automatic TLS via Cert-Manager
|
||||||
|
- Designed for microservices with similar patterns
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. **Helm Values Repository**
|
||||||
|
**Repository**: `git@github.com:fortedigital/helm-values.git`
|
||||||
|
**Purpose**: Environment-specific configuration for each application
|
||||||
|
**Location**: `C:\dev\k8s\helm-prod-values`
|
||||||
|
|
||||||
|
```
|
||||||
|
helm-prod-values/
|
||||||
|
├── mcp10x/
|
||||||
|
│ └── values.yaml # MCP 10X configuration
|
||||||
|
├── musicman/
|
||||||
|
│ └── values.yaml # Music Man configuration
|
||||||
|
├── mcpcoder/
|
||||||
|
│ └── values.yaml # MCP Coder configuration
|
||||||
|
└── argocd-mcp/
|
||||||
|
└── values.yaml # ArgoCD MCP configuration
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points**:
|
||||||
|
- Each app has its own folder with `values.yaml`
|
||||||
|
- Contains environment-specific settings (image tags, env vars, resources, etc.)
|
||||||
|
- Referenced by ArgoCD Applications using multi-source pattern
|
||||||
|
- Image tags are updated here by CI/CD pipelines
|
||||||
|
- Secrets are referenced by name (actual secrets stored as SealedSecrets)
|
||||||
|
|
||||||
|
**Example** (`mcp10x/values.yaml`):
|
||||||
|
```yaml
|
||||||
|
app:
|
||||||
|
image:
|
||||||
|
repository: ghcr.io/fortedigital/10x
|
||||||
|
tag: 2.0.4 # Updated by CI/CD
|
||||||
|
extraEnv:
|
||||||
|
- name: PORT
|
||||||
|
value: "3000"
|
||||||
|
envSecretName: "app-credentials" # References SealedSecret
|
||||||
|
|
||||||
|
ingress:
|
||||||
|
enabled: true
|
||||||
|
host: mcp10x.forteapps.net # Public domain
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. **Application Source Code Repositories**
|
||||||
|
**Purpose**: Application source code with CI/CD pipelines
|
||||||
|
**Examples**: Various private repositories
|
||||||
|
|
||||||
|
**Typical Structure**:
|
||||||
|
```
|
||||||
|
app-repository/
|
||||||
|
├── src/ # Application source code
|
||||||
|
├── Dockerfile # Container build definition
|
||||||
|
├── .github/
|
||||||
|
│ └── workflows/
|
||||||
|
│ └── build-and-deploy.yml # GitHub Actions workflow
|
||||||
|
└── package.json / requirements.txt # Dependencies
|
||||||
|
```
|
||||||
|
|
||||||
|
**CI/CD Workflow** (GitHub Actions):
|
||||||
|
1. Trigger on push to `main` branch
|
||||||
|
2. Build Docker image
|
||||||
|
3. Tag with version (e.g., `v2.0.4`)
|
||||||
|
4. Push to container registry (GHCR, Docker Hub, etc.)
|
||||||
|
5. Update image tag in `helm-values` repository
|
||||||
|
6. ArgoCD detects change and syncs automatically
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GitOps Workflow
|
||||||
|
|
||||||
|
### The App-of-Apps Pattern
|
||||||
|
|
||||||
|
```
|
||||||
|
_app-of-apps.yaml (Root)
|
||||||
|
│
|
||||||
|
├── infrastructure-apps (manages infra/)
|
||||||
|
│ ├── cluster-resources-application
|
||||||
|
│ ├── traefik-application
|
||||||
|
│ ├── cert-manager-application
|
||||||
|
│ ├── kyverno
|
||||||
|
│ ├── prometheus
|
||||||
|
│ ├── grafana
|
||||||
|
│ └── ... (other infra apps)
|
||||||
|
│
|
||||||
|
└── enterprise-apps (manages apps/)
|
||||||
|
├── mcp10x
|
||||||
|
├── musicman
|
||||||
|
├── dot-ai-stack
|
||||||
|
└── argo-mcp
|
||||||
|
```
|
||||||
|
|
||||||
|
**How It Works**:
|
||||||
|
1. Bootstrap script installs ArgoCD and applies `_app-of-apps.yaml`
|
||||||
|
2. ArgoCD creates the root Application which monitors `infra/` folder
|
||||||
|
3. Each YAML in `infra/` becomes a child Application
|
||||||
|
4. `enterprise-apps.yaml` monitors `apps/` folder and auto-discovers applications
|
||||||
|
5. ArgoCD continuously syncs (every 60s) and auto-heals drift
|
||||||
|
|
||||||
|
### Sync Waves & Ordering
|
||||||
|
|
||||||
|
Applications deploy in order using `argocd.argoproj.io/sync-wave` annotations:
|
||||||
|
|
||||||
|
```
|
||||||
|
Wave -1: Namespaces (created first)
|
||||||
|
Wave 0: Kyverno (policies ready before resources)
|
||||||
|
Wave 1: Cluster resources, infrastructure apps
|
||||||
|
Wave 2+: Business applications
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```yaml
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
argocd.argoproj.io/sync-wave: "1"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-Source Pattern
|
||||||
|
|
||||||
|
Applications like `mcp10x` and `musicman` use multiple sources:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
spec:
|
||||||
|
sources:
|
||||||
|
- repoURL: https://github.com/snothub/forte-helm
|
||||||
|
path: forteapp # Helm chart templates
|
||||||
|
helm:
|
||||||
|
valueFiles:
|
||||||
|
- $values/mcp10x/values.yaml # Reference to second source
|
||||||
|
|
||||||
|
- repoURL: git@github.com:fortedigital/helm-values.git
|
||||||
|
targetRevision: HEAD
|
||||||
|
ref: values # Named reference
|
||||||
|
```
|
||||||
|
|
||||||
|
**Benefits**:
|
||||||
|
- Chart templates separated from configuration
|
||||||
|
- Single chart reused across all apps
|
||||||
|
- Easy to update all apps by changing the chart
|
||||||
|
- Environment-specific values isolated in separate repo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CI/CD Pipeline
|
||||||
|
|
||||||
|
### Continuous Integration
|
||||||
|
|
||||||
|
**Application Repositories** contain GitHub Actions workflows:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: Build and Deploy
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v3
|
||||||
|
|
||||||
|
- name: Build Docker image
|
||||||
|
run: docker build -t ghcr.io/fortedigital/app:$VERSION .
|
||||||
|
|
||||||
|
- name: Push to registry
|
||||||
|
run: docker push ghcr.io/fortedigital/app:$VERSION
|
||||||
|
|
||||||
|
- name: Update Helm values
|
||||||
|
run: |
|
||||||
|
git clone git@github.com:fortedigital/helm-values.git
|
||||||
|
cd helm-values/app
|
||||||
|
sed -i "s/tag: .*/tag: $VERSION/" values.yaml
|
||||||
|
git commit -am "Update app to $VERSION"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
### Continuous Deployment
|
||||||
|
|
||||||
|
**ArgoCD** automatically syncs when changes are detected:
|
||||||
|
|
||||||
|
1. **Config Repo Change**:
|
||||||
|
- Developer updates `apps/myapp.yaml`
|
||||||
|
- Pushes to `sturdy-adventure` repo
|
||||||
|
- ArgoCD detects change (60s reconciliation)
|
||||||
|
- Syncs application to cluster
|
||||||
|
|
||||||
|
2. **Helm Values Change**:
|
||||||
|
- CI/CD updates `helm-values/myapp/values.yaml`
|
||||||
|
- ArgoCD detects change
|
||||||
|
- Pulls new Helm chart with updated values
|
||||||
|
- Applies to cluster
|
||||||
|
|
||||||
|
3. **Sync Policy**:
|
||||||
|
```yaml
|
||||||
|
syncPolicy:
|
||||||
|
automated:
|
||||||
|
prune: true # Remove deleted resources
|
||||||
|
selfHeal: true # Revert manual changes
|
||||||
|
retry:
|
||||||
|
limit: 5 # Retry up to 5 times
|
||||||
|
backoff:
|
||||||
|
duration: 5s
|
||||||
|
maxDuration: 3m
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deployment Validation
|
||||||
|
|
||||||
|
Before applying, ArgoCD:
|
||||||
|
- ✅ Validates YAML syntax
|
||||||
|
- ✅ Checks Kubernetes schema
|
||||||
|
- ✅ Runs server-side dry-run
|
||||||
|
- ✅ Verifies resource quotas
|
||||||
|
- ✅ Applies Kyverno policies
|
||||||
|
|
||||||
|
After applying:
|
||||||
|
- ✅ Waits for resources to become healthy
|
||||||
|
- ✅ Sends Slack notification (success/failure)
|
||||||
|
- ✅ Tracks sync status in UI
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Security Model
|
||||||
|
|
||||||
|
### Secret Management
|
||||||
|
|
||||||
|
**Sealed Secrets** encrypt secrets for safe Git storage:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Developer creates plain secret locally
|
||||||
|
kubectl create secret generic app-creds \
|
||||||
|
--from-literal=API_KEY=secret123 \
|
||||||
|
--dry-run=client -o yaml > private/app-creds.yaml
|
||||||
|
|
||||||
|
# Seal the secret using kubeseal
|
||||||
|
kubeseal --format=yaml \
|
||||||
|
--cert=pub-cert.pem \
|
||||||
|
< private/app-creds.yaml \
|
||||||
|
> secrets/app-creds-sealed.yaml
|
||||||
|
|
||||||
|
# Commit sealed secret to Git
|
||||||
|
git add secrets/app-creds-sealed.yaml
|
||||||
|
git commit -m "Add app credentials"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Storage**:
|
||||||
|
- ✅ Sealed secrets committed to Git
|
||||||
|
- ❌ Plain secrets kept in `private/` (Git-ignored) or discarded
|
||||||
|
- ⚠️ Secret rotation process not yet established
|
||||||
|
|
||||||
|
### Kyverno Policies
|
||||||
|
|
||||||
|
**Policy Engine** enforces security rules:
|
||||||
|
|
||||||
|
1. **Secret Cloning**: Automatically clones secrets to new namespaces
|
||||||
|
```yaml
|
||||||
|
# cluster-resources/policies/secret-cloner.yaml
|
||||||
|
# Secrets labeled "allowedToBeCloned: true" are synced
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Default Namespace Blocker**: Prevents use of `default` namespace
|
||||||
|
3. **Bare Pod Cleaner**: Removes pods without controllers (Deployments/StatefulSets)
|
||||||
|
4. **Deployment Verifier**: Ensures pods have proper controllers
|
||||||
|
5. **Auth Sidecar Injector**: Injects authentication proxy based on annotations
|
||||||
|
|
||||||
|
### Repository Access
|
||||||
|
|
||||||
|
**Private Repository Credentials** stored as SealedSecrets:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# cluster-resources/snothub-repo-credentials-sealed.yaml
|
||||||
|
# cluster-resources/forte10x-repo-credentials-sealed.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
ArgoCD uses these to access private Helm values repositories.
|
||||||
|
|
||||||
|
### Network Security
|
||||||
|
|
||||||
|
**Traefik Ingress** with TLS:
|
||||||
|
- All HTTP traffic redirects to HTTPS
|
||||||
|
- Let's Encrypt automatic certificate renewal
|
||||||
|
- Cert-Manager manages certificate lifecycle
|
||||||
|
- Per-application IngressRoutes with dedicated certificates
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
|
||||||
|
**Application-Level Auth** (optional):
|
||||||
|
- Token-based authentication (static tokens)
|
||||||
|
- OIDC integration (Keycloak, Okta, etc.)
|
||||||
|
- Auth sidecar injected via Kyverno policy
|
||||||
|
- Tokens stored in SealedSecrets
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```yaml
|
||||||
|
# In deployment.yaml template
|
||||||
|
annotations:
|
||||||
|
policies.forteapps.io/auth: "true"
|
||||||
|
policies.forteapps.io/auth-token-secret-name: "app-tokens"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring & Observability
|
||||||
|
|
||||||
|
### Stack Components
|
||||||
|
|
||||||
|
1. **Prometheus**: Metrics collection and storage
|
||||||
|
2. **Grafana**: Metrics visualization and dashboards
|
||||||
|
3. **Loki**: Log aggregation
|
||||||
|
4. **Fluent-Bit**: Log shipping from pods to Loki
|
||||||
|
5. **Trivy**: Container vulnerability scanning
|
||||||
|
|
||||||
|
### Slack Notifications
|
||||||
|
|
||||||
|
All ArgoCD applications send notifications to shared Slack channel:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
metadata:
|
||||||
|
annotations:
|
||||||
|
notifications.argoproj.io/subscribe.on-sync-succeeded.slack: ""
|
||||||
|
notifications.argoproj.io/subscribe.on-sync-failed.slack: ""
|
||||||
|
notifications.argoproj.io/subscribe.on-degraded.slack: ""
|
||||||
|
```
|
||||||
|
|
||||||
|
Notifications include:
|
||||||
|
- ✅ Sync succeeded
|
||||||
|
- ❌ Sync failed
|
||||||
|
- ⚠️ Application degraded
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Disaster Recovery
|
||||||
|
|
||||||
|
### Cluster Rebuild
|
||||||
|
|
||||||
|
**Current State**: No backup routines exist yet. Cluster can be rebuilt from Git.
|
||||||
|
|
||||||
|
**Rebuild Process**:
|
||||||
|
1. Provision new Kubernetes cluster
|
||||||
|
2. Clone `sturdy-adventure` repository
|
||||||
|
3. Run `./bootstrap.sh`
|
||||||
|
4. ArgoCD installs and syncs all applications
|
||||||
|
5. Manually recreate unsealed secrets and seal them
|
||||||
|
|
||||||
|
**Data Loss**:
|
||||||
|
- Currently: Data loss is acceptable (internal use)
|
||||||
|
- Future: One stateful application may require backup strategy
|
||||||
|
|
||||||
|
### GitOps Advantages for DR
|
||||||
|
|
||||||
|
✅ **Infrastructure as Code**: Entire cluster defined in Git
|
||||||
|
✅ **Reproducible**: Cluster can be rebuilt identically
|
||||||
|
✅ **Auditable**: All changes tracked in Git history
|
||||||
|
✅ **Rollback**: Easy to revert to previous Git commit
|
||||||
|
✅ **Multi-Cluster**: Same config can deploy to multiple clusters
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Repository Organization
|
||||||
|
|
||||||
|
✅ **DO**:
|
||||||
|
- Separate infrastructure (`infra/`) from applications (`apps/`)
|
||||||
|
- Use sync waves to control deployment order
|
||||||
|
- Keep secrets in `private/` folder (Git-ignored)
|
||||||
|
- Commit only sealed secrets to Git
|
||||||
|
- Use multi-source pattern for chart/values separation
|
||||||
|
|
||||||
|
❌ **DON'T**:
|
||||||
|
- Commit plain secrets to Git
|
||||||
|
- Mix infrastructure and application configs
|
||||||
|
- Hard-code environment-specific values in charts
|
||||||
|
- Manually modify resources in cluster (use Git)
|
||||||
|
|
||||||
|
### GitOps Workflow
|
||||||
|
|
||||||
|
✅ **DO**:
|
||||||
|
- All changes through Git (single source of truth)
|
||||||
|
- Use PR reviews for production changes
|
||||||
|
- Test changes in isolated namespaces first
|
||||||
|
- Monitor ArgoCD sync status
|
||||||
|
- Respond to Slack notifications
|
||||||
|
|
||||||
|
❌ **DON'T**:
|
||||||
|
- Use `kubectl apply` directly (breaks GitOps)
|
||||||
|
- Ignore sync failures
|
||||||
|
- Bypass ArgoCD for "quick fixes"
|
||||||
|
- Edit resources in place (`kubectl edit`)
|
||||||
|
|
||||||
|
### Application Development
|
||||||
|
|
||||||
|
✅ **DO**:
|
||||||
|
- Follow the `forteapp` chart pattern
|
||||||
|
- Use semantic versioning for image tags
|
||||||
|
- Update helm-values via CI/CD
|
||||||
|
- Test locally with Docker Compose
|
||||||
|
- Document environment variables
|
||||||
|
|
||||||
|
❌ **DON'T**:
|
||||||
|
- Use `latest` image tag
|
||||||
|
- Hard-code configuration in code
|
||||||
|
- Skip local testing
|
||||||
|
- Deploy untested images to production
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
📖 Continue to:
|
||||||
|
- **[Developer Guide](DEVELOPER-GUIDE.md)** - Learn how to deploy and manage applications
|
||||||
|
- **[Operations Runbook](OPERATIONS-RUNBOOK.md)** - Common operational tasks
|
||||||
|
- **[Technical Reference](REFERENCE.md)** - Detailed component documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated**: 2026-03-16
|
||||||
|
**Maintained By**: Platform Team
|
||||||
|
**Questions?**: Contact #platform-support on Slack
|
||||||
1217
docs/OPERATIONS-RUNBOOK.md
Normal file
1217
docs/OPERATIONS-RUNBOOK.md
Normal file
File diff suppressed because it is too large
Load Diff
327
docs/README.md
Normal file
327
docs/README.md
Normal file
@@ -0,0 +1,327 @@
|
|||||||
|
# Kubernetes Cluster Documentation
|
||||||
|
|
||||||
|
Welcome to the comprehensive documentation for our Kubernetes cluster GitOps setup. This documentation covers architecture, development workflows, operations, and technical references.
|
||||||
|
|
||||||
|
## 📚 Documentation Index
|
||||||
|
|
||||||
|
### 1. [GitOps Architecture & Repository Guide](GITOPS-ARCHITECTURE.md)
|
||||||
|
**Start here to understand the system**
|
||||||
|
|
||||||
|
Learn about:
|
||||||
|
- Overall architecture and design decisions
|
||||||
|
- Repository structure and relationships
|
||||||
|
- GitOps workflow and deployment patterns
|
||||||
|
- CI/CD pipeline integration
|
||||||
|
- Security model and best practices
|
||||||
|
|
||||||
|
**Best for**: Understanding how everything fits together, architectural decisions, and the big picture.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. [Developer Onboarding Guide](DEVELOPER-GUIDE.md)
|
||||||
|
**For developers deploying and maintaining applications**
|
||||||
|
|
||||||
|
Learn how to:
|
||||||
|
- Set up your local development environment
|
||||||
|
- Deploy your first application
|
||||||
|
- Update existing applications
|
||||||
|
- Manage secrets securely
|
||||||
|
- Troubleshoot common issues
|
||||||
|
- Follow development best practices
|
||||||
|
|
||||||
|
**Best for**: New developers joining the team, deploying applications, day-to-day development workflows.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. [Operations Runbook](OPERATIONS-RUNBOOK.md)
|
||||||
|
**For platform engineers and operators**
|
||||||
|
|
||||||
|
Learn how to:
|
||||||
|
- Bootstrap a new cluster
|
||||||
|
- Monitor and maintain applications
|
||||||
|
- Manage infrastructure components
|
||||||
|
- Handle secrets and credentials
|
||||||
|
- Troubleshoot production issues
|
||||||
|
- Perform disaster recovery
|
||||||
|
- Execute maintenance procedures
|
||||||
|
|
||||||
|
**Best for**: Platform team members, SRE tasks, incident response, cluster maintenance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. [Technical Reference](REFERENCE.md)
|
||||||
|
**Detailed technical specifications**
|
||||||
|
|
||||||
|
Reference for:
|
||||||
|
- Component specifications and versions
|
||||||
|
- Helm chart templates and values
|
||||||
|
- ArgoCD configuration options
|
||||||
|
- Kyverno policy definitions
|
||||||
|
- API endpoints and interfaces
|
||||||
|
- Configuration schemas
|
||||||
|
- Complete glossary
|
||||||
|
|
||||||
|
**Best for**: Looking up specific configuration options, understanding component details, API references.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
### For New Developers
|
||||||
|
1. Read [GitOps Architecture](GITOPS-ARCHITECTURE.md#overview) to understand the system
|
||||||
|
2. Follow [Developer Guide - Prerequisites](DEVELOPER-GUIDE.md#prerequisites) to set up your environment
|
||||||
|
3. Deploy your first application using [Deploying Your First Application](DEVELOPER-GUIDE.md#deploying-your-first-application)
|
||||||
|
|
||||||
|
### For Platform Engineers
|
||||||
|
1. Understand the architecture in [GitOps Architecture](GITOPS-ARCHITECTURE.md)
|
||||||
|
2. Learn cluster bootstrap in [Operations Runbook - Cluster Bootstrap](OPERATIONS-RUNBOOK.md#cluster-bootstrap)
|
||||||
|
3. Review [Day-to-Day Operations](OPERATIONS-RUNBOOK.md#day-to-day-operations) procedures
|
||||||
|
|
||||||
|
### For Troubleshooting
|
||||||
|
1. Check [Developer Guide - Troubleshooting](DEVELOPER-GUIDE.md#troubleshooting) for common developer issues
|
||||||
|
2. Check [Operations Runbook - Troubleshooting](OPERATIONS-RUNBOOK.md#troubleshooting) for operational issues
|
||||||
|
3. Consult [Technical Reference](REFERENCE.md) for configuration details
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🗺️ Documentation Map
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ GITOPS ARCHITECTURE │
|
||||||
|
│ (System Overview, Repositories, Workflows, Security) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────┴───────────┐
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌──────────────────┐ ┌──────────────────────┐
|
||||||
|
│ DEVELOPER GUIDE │ │ OPERATIONS RUNBOOK │
|
||||||
|
│ (Development) │ │ (Operations) │
|
||||||
|
└──────────────────┘ └──────────────────────┘
|
||||||
|
│ │
|
||||||
|
└───────────┬───────────┘
|
||||||
|
▼
|
||||||
|
┌────────────────────┐
|
||||||
|
│ TECHNICAL REFERENCE│
|
||||||
|
│ (Specifications) │
|
||||||
|
└────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📖 Reading Paths
|
||||||
|
|
||||||
|
### Path 1: New Developer (No K8s Experience)
|
||||||
|
1. [GitOps Architecture - Overview](GITOPS-ARCHITECTURE.md#overview)
|
||||||
|
2. [GitOps Architecture - GitOps Workflow](GITOPS-ARCHITECTURE.md#gitops-workflow)
|
||||||
|
3. [Developer Guide - Understanding the Workflow](DEVELOPER-GUIDE.md#understanding-the-workflow)
|
||||||
|
4. [Developer Guide - Deploying Your First Application](DEVELOPER-GUIDE.md#deploying-your-first-application)
|
||||||
|
5. [Developer Guide - Troubleshooting](DEVELOPER-GUIDE.md#troubleshooting)
|
||||||
|
|
||||||
|
### Path 2: Experienced Developer (Has K8s Experience)
|
||||||
|
1. [GitOps Architecture - Repository Structure](GITOPS-ARCHITECTURE.md#repository-structure)
|
||||||
|
2. [Developer Guide - Local Development Setup](DEVELOPER-GUIDE.md#local-development-setup)
|
||||||
|
3. [Developer Guide - Deploying Your First Application](DEVELOPER-GUIDE.md#deploying-your-first-application)
|
||||||
|
4. [Technical Reference - Helm Chart Reference](REFERENCE.md#helm-chart-reference)
|
||||||
|
|
||||||
|
### Path 3: Platform Engineer / SRE
|
||||||
|
1. [GitOps Architecture](GITOPS-ARCHITECTURE.md) (entire document)
|
||||||
|
2. [Operations Runbook - Cluster Bootstrap](OPERATIONS-RUNBOOK.md#cluster-bootstrap)
|
||||||
|
3. [Operations Runbook - Day-to-Day Operations](OPERATIONS-RUNBOOK.md#day-to-day-operations)
|
||||||
|
4. [Operations Runbook - Troubleshooting](OPERATIONS-RUNBOOK.md#troubleshooting)
|
||||||
|
5. [Technical Reference](REFERENCE.md) (as needed)
|
||||||
|
|
||||||
|
### Path 4: Quick Reference
|
||||||
|
1. [Developer Guide - Quick Reference](DEVELOPER-GUIDE.md#quick-reference)
|
||||||
|
2. [Technical Reference - Configuration Reference](REFERENCE.md#configuration-reference)
|
||||||
|
3. [Technical Reference - Glossary](REFERENCE.md#glossary)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Finding Information
|
||||||
|
|
||||||
|
### How do I...?
|
||||||
|
|
||||||
|
| Task | Documentation |
|
||||||
|
|------|---------------|
|
||||||
|
| **Deploy a new application** | [Developer Guide - Deploying Your First Application](DEVELOPER-GUIDE.md#deploying-your-first-application) |
|
||||||
|
| **Update an existing application** | [Developer Guide - Updating an Existing Application](DEVELOPER-GUIDE.md#updating-an-existing-application) |
|
||||||
|
| **Create and seal secrets** | [Developer Guide - Working with Secrets](DEVELOPER-GUIDE.md#working-with-secrets) |
|
||||||
|
| **Troubleshoot deployment issues** | [Developer Guide - Troubleshooting](DEVELOPER-GUIDE.md#troubleshooting) |
|
||||||
|
| **Bootstrap a new cluster** | [Operations Runbook - Cluster Bootstrap](OPERATIONS-RUNBOOK.md#cluster-bootstrap) |
|
||||||
|
| **Scale an application** | [Operations Runbook - Scaling Applications](OPERATIONS-RUNBOOK.md#scaling-applications) |
|
||||||
|
| **Roll back a deployment** | [Operations Runbook - Rolling Back Deployments](OPERATIONS-RUNBOOK.md#rolling-back-deployments) |
|
||||||
|
| **Manage monitoring** | [Operations Runbook - Monitoring & Alerting](OPERATIONS-RUNBOOK.md#monitoring--alerting) |
|
||||||
|
| **Understand ArgoCD config** | [Technical Reference - ArgoCD Configuration](REFERENCE.md#argocd-configuration) |
|
||||||
|
| **Look up Helm values** | [Technical Reference - Helm Chart Reference](REFERENCE.md#helm-chart-reference) |
|
||||||
|
| **Find component versions** | [Technical Reference - Version Matrix](REFERENCE.md#version-matrix) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 System Overview
|
||||||
|
|
||||||
|
### Cluster Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────────────────────┐
|
||||||
|
│ GitHub Repositories │
|
||||||
|
│ ┌────────────┐ ┌────────────┐ ┌────────────────────────┐ │
|
||||||
|
│ │ Config │ │ Charts │ │ Values │ │
|
||||||
|
│ │ (ArgoCD) │ │ (Templates)│ │ (Environment Config) │ │
|
||||||
|
│ └────────────┘ └────────────┘ └────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────────────────────────────┐
|
||||||
|
│ ArgoCD (GitOps Engine) │
|
||||||
|
│ Sync every 60 seconds │
|
||||||
|
└──────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────────────────────────────┐
|
||||||
|
│ Kubernetes Cluster (UpCloud) │
|
||||||
|
│ ┌──────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Infrastructure: Traefik, Cert-Manager, Kyverno │ │
|
||||||
|
│ ├──────────────────────────────────────────────────────┤ │
|
||||||
|
│ │ Monitoring: Prometheus, Grafana, Loki, Fluent-Bit │ │
|
||||||
|
│ ├──────────────────────────────────────────────────────┤ │
|
||||||
|
│ │ Applications: mcp10x, musicman, dot-ai-stack │ │
|
||||||
|
│ └──────────────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Technologies
|
||||||
|
|
||||||
|
- **GitOps**: ArgoCD
|
||||||
|
- **Kubernetes**: UpCloud Managed Kubernetes
|
||||||
|
- **Ingress**: Traefik v2
|
||||||
|
- **Certificates**: Cert-Manager + Let's Encrypt
|
||||||
|
- **Policies**: Kyverno
|
||||||
|
- **Secrets**: Sealed Secrets
|
||||||
|
- **Monitoring**: Prometheus + Grafana
|
||||||
|
- **Logging**: Loki + Fluent-Bit
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🛠️ Common Tasks
|
||||||
|
|
||||||
|
### Development Tasks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deploy new application
|
||||||
|
cd ~/dev/k8s/launchpad
|
||||||
|
# Create apps/myapp.yaml and helm-prod-values/myapp/values.yaml
|
||||||
|
git add apps/myapp.yaml
|
||||||
|
git commit -m "Add myapp"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Update application
|
||||||
|
cd ~/dev/k8s/helm-prod-values
|
||||||
|
vim myapp/values.yaml
|
||||||
|
git commit -am "Update myapp config"
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Create secret
|
||||||
|
kubeseal --format=yaml --cert=pub-cert.pem \
|
||||||
|
< private/secret.yaml > secrets/secret-sealed.yaml
|
||||||
|
git add secrets/secret-sealed.yaml
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
### Operations Tasks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check application status
|
||||||
|
kubectl get applications -n argocd
|
||||||
|
|
||||||
|
# View application details
|
||||||
|
kubectl describe application myapp -n argocd
|
||||||
|
|
||||||
|
# Force sync
|
||||||
|
kubectl patch application myapp -n argocd \
|
||||||
|
--type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'
|
||||||
|
|
||||||
|
# Check pod logs
|
||||||
|
kubectl logs -n myapp <pod-name>
|
||||||
|
|
||||||
|
# Restart deployment
|
||||||
|
kubectl rollout restart deployment myapp -n myapp
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🆘 Getting Help
|
||||||
|
|
||||||
|
### Documentation Search Order
|
||||||
|
|
||||||
|
1. **Quick Reference**: [Developer Guide - Quick Reference](DEVELOPER-GUIDE.md#quick-reference)
|
||||||
|
2. **Troubleshooting**: [Developer Guide - Troubleshooting](DEVELOPER-GUIDE.md#troubleshooting) or [Operations Runbook - Troubleshooting](OPERATIONS-RUNBOOK.md#troubleshooting)
|
||||||
|
3. **Technical Details**: [Technical Reference](REFERENCE.md)
|
||||||
|
4. **Architecture Context**: [GitOps Architecture](GITOPS-ARCHITECTURE.md)
|
||||||
|
|
||||||
|
### Support Channels
|
||||||
|
|
||||||
|
- **Slack**: #platform-support
|
||||||
|
- **Issues**: Platform team
|
||||||
|
- **Emergencies**: Escalate via Slack
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Document Maintenance
|
||||||
|
|
||||||
|
### Updating Documentation
|
||||||
|
|
||||||
|
If you find:
|
||||||
|
- Outdated information
|
||||||
|
- Missing procedures
|
||||||
|
- Errors or typos
|
||||||
|
- Areas needing clarification
|
||||||
|
|
||||||
|
Please:
|
||||||
|
1. Create an issue or PR in the repository
|
||||||
|
2. Notify the platform team
|
||||||
|
3. Update the relevant documentation file
|
||||||
|
|
||||||
|
### Documentation Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
docs/
|
||||||
|
├── README.md # This file (index)
|
||||||
|
├── GITOPS-ARCHITECTURE.md # Architecture overview
|
||||||
|
├── DEVELOPER-GUIDE.md # Developer workflows
|
||||||
|
├── OPERATIONS-RUNBOOK.md # Operations procedures
|
||||||
|
└── REFERENCE.md # Technical specifications
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 Documentation Versions
|
||||||
|
|
||||||
|
**Current Version**: 1.0.0
|
||||||
|
**Last Updated**: 2026-03-16
|
||||||
|
**Maintained By**: Platform Team
|
||||||
|
|
||||||
|
### Changelog
|
||||||
|
|
||||||
|
- **v1.0.0 (2026-03-16)**: Initial comprehensive documentation release
|
||||||
|
- GitOps Architecture guide
|
||||||
|
- Developer Onboarding guide
|
||||||
|
- Operations Runbook
|
||||||
|
- Technical Reference
|
||||||
|
- Documentation index
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Next Steps
|
||||||
|
|
||||||
|
Choose your path:
|
||||||
|
|
||||||
|
- 👨💻 **New Developer?** Start with [Developer Guide](DEVELOPER-GUIDE.md)
|
||||||
|
- 🔧 **Platform Engineer?** Read [Operations Runbook](OPERATIONS-RUNBOOK.md)
|
||||||
|
- 🏗️ **Architect?** Explore [GitOps Architecture](GITOPS-ARCHITECTURE.md)
|
||||||
|
- 🔍 **Need Details?** Check [Technical Reference](REFERENCE.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Welcome to the team! 🚀**
|
||||||
1070
docs/REFERENCE.md
Normal file
1070
docs/REFERENCE.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user