519 lines
17 KiB
Markdown
519 lines
17 KiB
Markdown
# Kubernetes Cluster - GitOps Configuration
|
|
|
|
> **Kubernetes cluster bootstrapping and GitOps configuration repository** using ArgoCD for UpCloud Managed Kubernetes
|
|
|
|
[](https://argoproj.github.io/cd/)
|
|
[](https://upcloud.com/)
|
|
[](https://git.forteapps.net/Forte/launchpad/pages/)
|
|
|
|
---
|
|
|
|
## 📚 Complete Documentation
|
|
|
|
**New developers and operators**: Please refer to our comprehensive documentation for detailed guides and references:
|
|
|
|
### 🌐 [**Live Documentation Site**](https://git.forteapps.net/Forte/launchpad/pages/) (Gitea Pages)
|
|
|
|
### 🎯 [**START HERE: Documentation Index**](docs/README.md)
|
|
|
|
| Document | Description | Audience |
|
|
|----------|-------------|----------|
|
|
| **[GitOps Architecture](docs/GITOPS-ARCHITECTURE.md)** | System architecture, repository structure, GitOps workflows, security model | Everyone (start here) |
|
|
| **[Developer Guide](docs/DEVELOPER-GUIDE.md)** | Local setup, deploying apps, managing secrets, troubleshooting | Developers |
|
|
| **[Operations Runbook](docs/OPERATIONS-RUNBOOK.md)** | Cluster bootstrap, day-to-day operations, incident response, maintenance | Platform Engineers, SREs |
|
|
| **[Technical Reference](docs/REFERENCE.md)** | Component specs, Helm charts, ArgoCD config, Kyverno policies, API docs | Everyone (reference) |
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### For New Developers
|
|
```bash
|
|
# 1. Clone repositories
|
|
git clone https://git.forteapps.net/Forte/launchpad.git
|
|
git clone ssh://git@git.forteapps.net:2222/Forte/helm-prod-values.git
|
|
|
|
# 2. Read the guides
|
|
# - Start: docs/GITOPS-ARCHITECTURE.md
|
|
# - Follow: docs/DEVELOPER-GUIDE.md
|
|
|
|
# 3. Deploy your first app (see Developer Guide)
|
|
```
|
|
|
|
### For Operators
|
|
```bash
|
|
# 1. Bootstrap new cluster
|
|
./bootstrap.sh
|
|
|
|
# 2. Verify deployment
|
|
kubectl get applications -n argocd
|
|
kubectl get pods --all-namespaces
|
|
|
|
# 3. Read Operations Runbook for day-to-day tasks
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Overview
|
|
|
|
This repository contains the complete GitOps configuration for our Kubernetes cluster, using the **App-of-Apps pattern** with ArgoCD.
|
|
|
|
### What's Inside
|
|
|
|
- **Infrastructure Applications**: Traefik, Cert-Manager, Kyverno, Prometheus, Grafana, Loki, Tempo, Sealed Secrets
|
|
- **Business Applications**: MCP10X, MusicMan, Dot-AI Stack, ArgoCD MCP
|
|
- **Policies**: Kyverno security policies for secret management, namespace controls, pod verification
|
|
- **Monitoring**: Full observability stack with metrics, logs, traces, and alerting
|
|
- **Secrets**: Sealed Secrets for secure Git storage
|
|
|
|
### Key Features
|
|
|
|
✅ **GitOps-Native**: Git is the single source of truth
|
|
✅ **Auto-Sync**: Changes automatically deployed (60s reconciliation)
|
|
✅ **Self-Healing**: Manual cluster changes are reverted
|
|
✅ **Multi-Source**: Separate chart templates from configuration
|
|
✅ **Policy Enforcement**: Kyverno ensures security and compliance
|
|
✅ **Authentication**: Automatic sidecar injection (token & OIDC support)
|
|
✅ **TLS Everywhere**: Automatic Let's Encrypt certificates
|
|
✅ **Full Observability**: Prometheus, Grafana, Loki, Tempo integration
|
|
|
|
---
|
|
|
|
## 🗂️ Repository Structure
|
|
|
|
```
|
|
.
|
|
├── bootstrap.sh # Cluster initialization script
|
|
├── _app-of-apps.yaml # Root ArgoCD Application (App-of-Apps pattern)
|
|
├── mkdocs.yml # MkDocs configuration (Gitea Pages)
|
|
│
|
|
├── .gitea/workflows/ # Gitea Actions CI workflows
|
|
│ └── docs.yaml # Build & deploy MkDocs to Gitea Pages
|
|
│
|
|
├── infra/ # Infrastructure ArgoCD Applications (Kustomize multi-cluster)
|
|
│ ├── base/ # Base ArgoCD Application manifests (EU defaults)
|
|
│ │ ├── kustomization.yaml
|
|
│ │ ├── traefik-application.yaml
|
|
│ │ ├── keycloak.yaml
|
|
│ │ ├── grafana.yaml
|
|
│ │ ├── gitea.yaml
|
|
│ │ ├── gitea-actions.yaml
|
|
│ │ ├── tempo.yaml
|
|
│ │ ├── renovate.yaml
|
|
│ │ ├── ... # All other Application manifests
|
|
│ │ └── secrets.yaml
|
|
│ ├── overlays/ # Per-cluster overrides
|
|
│ │ ├── upc-dev/ # UpCloud Dev cluster (uses base as-is)
|
|
│ │ └── upc-prod/ # UpCloud Prod cluster (patches value paths)
|
|
│ ├── dashboards/ # Grafana dashboard ConfigMaps
|
|
│ └── values/ # Helm value overrides
|
|
│ ├── base/ # Shared values (all clusters)
|
|
│ ├── upc-dev/ # UpCloud Dev-specific values
|
|
│ └── upc-prod/ # UpCloud Prod-specific values
|
|
│
|
|
├── apps/ # Business Applications
|
|
│ ├── mcp10x.yaml
|
|
│ ├── musicman.yaml
|
|
│ ├── dot-ai-stack.yaml
|
|
│ └── argo-mcp.yaml
|
|
│
|
|
├── cluster-resources/ # Cluster-wide Kubernetes resources
|
|
│ ├── letsencrypt-issuer.yaml
|
|
│ ├── kyverno-config.yaml
|
|
│ ├── *-sealed.yaml # Sealed secrets
|
|
│ └── policies/ # Kyverno policies
|
|
│ ├── secret-cloner.yaml
|
|
│ ├── default-ns-blocker.yaml
|
|
│ ├── bare-pod-cleaner.yaml
|
|
│ └── auth-sidecar-injector.yaml
|
|
│
|
|
├── secrets/ # Application secrets (sealed)
|
|
│ └── *-credentials-sealed.yaml
|
|
│
|
|
├── private/ # Local-only files (Git-ignored)
|
|
│ └── *.yaml # Unsealed secrets (never committed)
|
|
│
|
|
└── docs/ # 📚 Comprehensive documentation
|
|
├── README.md # Documentation index
|
|
├── GITOPS-ARCHITECTURE.md # Architecture guide
|
|
├── DEVELOPER-GUIDE.md # Developer onboarding
|
|
├── OPERATIONS-RUNBOOK.md # Operations procedures
|
|
└── REFERENCE.md # Technical reference
|
|
```
|
|
|
|
**See [GitOps Architecture - Repository Structure](docs/GITOPS-ARCHITECTURE.md#repository-structure) for detailed explanation.**
|
|
|
|
---
|
|
|
|
## 🏗️ Architecture
|
|
|
|
### Three-Repository Pattern
|
|
|
|
| Repository | Purpose | Who Edits | How Often |
|
|
|------------|---------|-----------|-----------|
|
|
| **[launchpad](https://git.forteapps.net/Forte/launchpad)** (this repo) | ArgoCD Applications, cluster resources | Platform / DevOps engineers | ✅ Often |
|
|
| **[forte-helm](https://git.forteapps.net/Forte/forte-helm)** | Generic Helm chart templates | Platform engineers | ❌ Rarely |
|
|
| **[helm-values](ssh://git@git.forteapps.net:2222/Forte/helm-prod-values.git)** | App-specific configuration & versions | Developers / CI pipelines | ✅ Sometimes |
|
|
|
|
### GitOps Workflow
|
|
|
|
```
|
|
Developer commits code → CI/CD builds image → Updates helm-values → ArgoCD syncs → Deployed to cluster
|
|
```
|
|
|
|
**Learn more**: [GitOps Architecture - GitOps Workflow](docs/GITOPS-ARCHITECTURE.md#gitops-workflow)
|
|
|
|
---
|
|
|
|
## 🔧 Common Tasks
|
|
|
|
### Deploy a New Application
|
|
|
|
**See detailed guide**: [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application)
|
|
|
|
**Quick version**:
|
|
1. Create `apps/myapp.yaml` (ArgoCD Application manifest)
|
|
2. Create `helm-values/myapp/values.yaml` (configuration)
|
|
3. Create sealed secrets if needed
|
|
4. Commit and push - ArgoCD auto-syncs!
|
|
|
|
### Update an Existing Application
|
|
|
|
**See detailed guide**: [Developer Guide - Updating an Existing Application](docs/DEVELOPER-GUIDE.md#updating-an-existing-application)
|
|
|
|
**Quick version**:
|
|
- **Update code**: Push to app repo → CI/CD updates image tag in helm-values
|
|
- **Update config**: Edit `helm-values/myapp/values.yaml` → commit → push
|
|
|
|
### Manage Secrets
|
|
|
|
**See detailed guide**: [Developer Guide - Working with Secrets](docs/DEVELOPER-GUIDE.md#working-with-secrets)
|
|
|
|
```bash
|
|
# Create plain secret
|
|
kubectl create secret generic myapp-creds \
|
|
--from-literal=KEY=value \
|
|
--dry-run=client -o yaml > private/myapp-creds.yaml
|
|
|
|
# Seal it
|
|
kubeseal --format=yaml --cert=pub-cert.pem \
|
|
< private/myapp-creds.yaml > secrets/myapp-creds-sealed.yaml
|
|
|
|
# Commit sealed version
|
|
git add secrets/myapp-creds-sealed.yaml
|
|
git commit -m "Add myapp credentials"
|
|
git push
|
|
```
|
|
|
|
### Enable Authentication
|
|
|
|
**See detailed guide**: [Developer Guide - Enabling Authentication](docs/DEVELOPER-GUIDE.md#enabling-authentication-for-applications)
|
|
|
|
**Quick version**:
|
|
```yaml
|
|
# In helm-values/myapp/values.yaml
|
|
|
|
# Token-based auth (simple)
|
|
auth:
|
|
enabled: true
|
|
type: token
|
|
tokens:
|
|
- your-secret-token-here
|
|
|
|
# OIDC auth (SSO)
|
|
auth:
|
|
enabled: true
|
|
type: oidc
|
|
oidc:
|
|
authority: https://auth.example.com/realms/master
|
|
clientId: myapp
|
|
```
|
|
|
|
Then create OIDC secret (if using OIDC):
|
|
```bash
|
|
kubectl create secret generic auth-oidc \
|
|
--from-literal=client-secret=your-oidc-secret \
|
|
--from-literal=cookie-secret=$(openssl rand -hex 32) \
|
|
--namespace=myapp | \
|
|
kubeseal --format=yaml --cert=pub-cert.pem --namespace=myapp | \
|
|
kubectl apply -f -
|
|
```
|
|
|
|
### Bootstrap Cluster
|
|
|
|
**See detailed guide**: [Operations Runbook - Cluster Bootstrap](docs/OPERATIONS-RUNBOOK.md#cluster-bootstrap)
|
|
|
|
```bash
|
|
# Initialize new cluster
|
|
./bootstrap.sh
|
|
|
|
# Verify
|
|
kubectl get applications -n argocd
|
|
kubectl get pods --all-namespaces
|
|
```
|
|
|
|
---
|
|
|
|
## 🛠️ Quick Reference
|
|
|
|
### Monitor Applications
|
|
|
|
```bash
|
|
# List all ArgoCD applications
|
|
kubectl get applications -n argocd
|
|
|
|
# Watch sync status
|
|
kubectl get applications -n argocd -w
|
|
|
|
# Check specific application
|
|
kubectl describe application myapp -n argocd
|
|
|
|
# View application logs
|
|
kubectl logs -n myapp <pod-name>
|
|
```
|
|
|
|
### Access UIs
|
|
|
|
```bash
|
|
# ArgoCD UI
|
|
kubectl port-forward svc/argocd-server -n argocd 8080:443
|
|
# Access: https://localhost:8080 (no auth required)
|
|
|
|
# Grafana
|
|
kubectl port-forward -n monitoring svc/grafana 3000:80
|
|
# Access: http://localhost:3000
|
|
|
|
# Prometheus
|
|
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
|
|
# Access: http://localhost:9090
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n myapp
|
|
|
|
# View pod logs
|
|
kubectl logs -n myapp <pod-name>
|
|
|
|
# Check pod events
|
|
kubectl describe pod -n myapp <pod-name>
|
|
|
|
# Check ArgoCD sync errors
|
|
kubectl describe application myapp -n argocd
|
|
|
|
# Force sync
|
|
kubectl patch application myapp -n argocd \
|
|
--type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'
|
|
```
|
|
|
|
**Full troubleshooting guide**: [Developer Guide - Troubleshooting](docs/DEVELOPER-GUIDE.md#troubleshooting)
|
|
|
|
---
|
|
|
|
## 🔐 Security
|
|
|
|
### Secret Management
|
|
- ✅ Sealed Secrets for Git storage
|
|
- ✅ Kyverno auto-clones secrets to namespaces
|
|
- ❌ Never commit plain secrets
|
|
|
|
### Network Security
|
|
- ✅ All traffic TLS-encrypted (Let's Encrypt)
|
|
- ✅ HTTP → HTTPS redirect
|
|
- ✅ Traefik IngressRoute per application
|
|
|
|
### Policy Enforcement
|
|
- ✅ Kyverno policies for security
|
|
- ✅ Default namespace blocked
|
|
- ✅ Bare pods not allowed
|
|
- ✅ Optional authentication sidecar injection
|
|
|
|
**Learn more**: [GitOps Architecture - Security Model](docs/GITOPS-ARCHITECTURE.md#security-model)
|
|
|
|
---
|
|
|
|
## 📊 Infrastructure Components
|
|
|
|
| Component | Purpose | Namespace | Replicas |
|
|
|-----------|---------|-----------|----------|
|
|
| **ArgoCD** | GitOps controller | `argocd` | 1 |
|
|
| **Traefik** | Ingress controller | `traefik` | 2 |
|
|
| **Cert-Manager** | TLS certificates | `cert-manager` | 1 |
|
|
| **Kyverno** | Policy engine | `kyverno` | 1 |
|
|
| **Sealed Secrets** | Secret encryption | `kube-system` | 1 |
|
|
| **Prometheus** | Metrics | `monitoring` | 1 |
|
|
| **Grafana** | Dashboards | `monitoring` | 1 |
|
|
| **Loki** | Logs | `monitoring` | 1 |
|
|
| **Tempo** | Distributed tracing | `monitoring` | 1 |
|
|
| **Fluent-Bit** | Log shipping | `monitoring` | DaemonSet |
|
|
| **OpenCost** | Cost monitoring | `monitoring` | 1 |
|
|
| **Renovate** | Dependency updates | `renovate` | CronJob |
|
|
| **Trivy** | Vulnerability scanning | `trivy-system` | 1 |
|
|
| **Gitea Pages** | Documentation hosting | N/A (Gitea built-in) | N/A |
|
|
|
|
**Full specs**: [Technical Reference - Infrastructure Components](docs/REFERENCE.md#infrastructure-components)
|
|
|
|
---
|
|
|
|
## 🌐 Domains & Networking
|
|
|
|
- **Local development**: `*.127.0.0.1.nip.io`
|
|
- **Production**: `*.forteapps.net`
|
|
- **DNS**: Manual configuration (contact platform team)
|
|
- **TLS**: Automatic via Let's Encrypt
|
|
|
|
---
|
|
|
|
## 📖 Key Concepts
|
|
|
|
### App-of-Apps Pattern
|
|
`_app-of-apps.yaml` is the root Application that manages all other Applications in `infra/`. Kustomize overlays in `infra/overlays/{upc-dev,upc-prod}/` render the base Applications with per-cluster patches (e.g., swapping value file paths from `upc-dev` to `upc-prod`).
|
|
|
|
### Multi-Source Pattern
|
|
Applications reference both:
|
|
1. **Helm charts** from `forte-helm` (templates)
|
|
2. **Values** from `helm-values` (configuration)
|
|
|
|
This separates reusable templates from environment-specific config.
|
|
|
|
### Sync Waves
|
|
Applications deploy in order using `argocd.argoproj.io/sync-wave`:
|
|
- Wave `-1`: Namespaces
|
|
- Wave `0`: Kyverno (policies)
|
|
- Wave `1`: Infrastructure
|
|
- Wave `2+`: Applications
|
|
|
|
### Auto-Sync & Self-Heal
|
|
- **Auto-Sync**: ArgoCD automatically deploys Git changes (60s polling)
|
|
- **Self-Heal**: Manual cluster changes are reverted to match Git
|
|
- **Prune**: Deleted resources in Git are removed from cluster
|
|
|
|
**Learn more**: [GitOps Architecture - GitOps Workflow](docs/GITOPS-ARCHITECTURE.md#gitops-workflow)
|
|
|
|
---
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### ArgoCD Settings
|
|
- **Reconciliation**: Every 60 seconds
|
|
- **Sync timeout**: 5 minutes per application
|
|
- **Retry policy**: 5 attempts with exponential backoff
|
|
- **Authentication**: Disabled (internal use only)
|
|
|
|
### Application Defaults
|
|
- **Auto-sync**: Enabled
|
|
- **Self-heal**: Enabled
|
|
- **Prune**: Enabled
|
|
- **Validation**: Server-side validation enabled
|
|
- **Server-side apply**: Enabled
|
|
|
|
**Full configuration**: [Technical Reference - ArgoCD Configuration](docs/REFERENCE.md#argocd-configuration)
|
|
|
|
---
|
|
|
|
## 🆘 Getting Help
|
|
|
|
### Documentation
|
|
1. **Start here**: [Documentation Index](docs/README.md)
|
|
2. **For development**: [Developer Guide](docs/DEVELOPER-GUIDE.md)
|
|
3. **For operations**: [Operations Runbook](docs/OPERATIONS-RUNBOOK.md)
|
|
4. **For reference**: [Technical Reference](docs/REFERENCE.md)
|
|
|
|
### Support
|
|
- **Slack**: #platform-support
|
|
- **Issues**: Contact platform team
|
|
- **Emergencies**: Escalate via Slack
|
|
|
|
### Common Questions
|
|
|
|
| Question | Answer |
|
|
|----------|--------|
|
|
| How do I deploy an app? | [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application) |
|
|
| How do I manage secrets? | [Developer Guide - Working with Secrets](docs/DEVELOPER-GUIDE.md#working-with-secrets) |
|
|
| App won't sync? | [Developer Guide - Troubleshooting](docs/DEVELOPER-GUIDE.md#troubleshooting) |
|
|
| How do I bootstrap a cluster? | [Operations Runbook - Cluster Bootstrap](docs/OPERATIONS-RUNBOOK.md#cluster-bootstrap) |
|
|
| Where are the logs? | [Operations Runbook - Monitoring & Alerting](docs/OPERATIONS-RUNBOOK.md#monitoring--alerting) |
|
|
|
|
---
|
|
|
|
## 🤝 Contributing
|
|
|
|
### Adding a New Application
|
|
1. Read [Developer Guide - Deploying Your First Application](docs/DEVELOPER-GUIDE.md#deploying-your-first-application)
|
|
2. Create ArgoCD Application manifest in `apps/`
|
|
3. Create Helm values in `helm-values/`
|
|
4. Create sealed secrets if needed
|
|
5. Commit and push - ArgoCD handles the rest!
|
|
|
|
### Modifying Infrastructure
|
|
1. Read [Operations Runbook](docs/OPERATIONS-RUNBOOK.md)
|
|
2. Update relevant files in `infra/` or `cluster-resources/`
|
|
3. Test changes in isolated namespace if possible
|
|
4. Commit and push
|
|
5. Monitor sync status in Slack/ArgoCD UI
|
|
|
|
### Updating Documentation
|
|
Documentation lives in `docs/`. To update:
|
|
1. Edit relevant markdown file
|
|
2. Update "Last Updated" date
|
|
3. Submit PR or push directly
|
|
4. Notify team of significant changes
|
|
|
|
---
|
|
|
|
## 📝 Notes
|
|
|
|
### Current Environment
|
|
- **Provider**: UpCloud Managed Kubernetes
|
|
- **Environment**: Production (internal use only)
|
|
- **Clusters**: Multi-cluster (upc-dev, upc-prod) via Kustomize overlays
|
|
- **Auth**: Disabled for ArgoCD (internal access)
|
|
- **Backup**: None (cluster rebuildable via GitOps)
|
|
|
|
### Known Limitations
|
|
- No automated backups (yet)
|
|
- Secret rotation not automated
|
|
- Multi-cluster limited to upc-dev and upc-prod environments
|
|
- DNS management is manual
|
|
|
|
**Future improvements**: See [Operations Runbook - Disaster Recovery](docs/OPERATIONS-RUNBOOK.md#disaster-recovery)
|
|
|
|
---
|
|
|
|
## 📚 Additional Resources
|
|
|
|
### External Documentation
|
|
- [ArgoCD Documentation](https://argo-cd.readthedocs.io/)
|
|
- [Kyverno Documentation](https://kyverno.io/docs/)
|
|
- [Traefik Documentation](https://doc.traefik.io/traefik/)
|
|
- [Cert-Manager Documentation](https://cert-manager.io/docs/)
|
|
- [Grafana Tempo Documentation](https://grafana.com/docs/tempo/)
|
|
- [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets)
|
|
|
|
### Related Repositories
|
|
- [forte-helm](https://github.com/fortedigital/forte-helm) - Helm chart templates
|
|
- [helm-values](git@github.com:fortedigital/helm-values.git) - Application values
|
|
|
|
---
|
|
|
|
## 📄 License
|
|
|
|
Internal use only. Not for public distribution.
|
|
|
|
---
|
|
|
|
## 👥 Maintainers
|
|
|
|
**Platform Team**
|
|
- Contact: #platform-support on Slack
|
|
- Issues: Create issue in repository or contact team directly
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-04-18
|
|
**Documentation Version**: 1.0.0
|
|
|
|
**🚀 Ready to get started? Check out the [Documentation Index](docs/README.md)!**
|