tofu+tools

This commit is contained in:
2026-05-31 19:53:26 +02:00
parent e319295f62
commit 24c59256c9
3 changed files with 322 additions and 6 deletions

View File

@@ -19,7 +19,9 @@
"dotnet-sdk@latest", "dotnet-sdk@latest",
"opentofu@1.11.6", "opentofu@1.11.6",
"_1password@latest", "_1password@latest",
"github-cli@latest" "github-cli@latest",
"upcloud-cli@3.29.0",
"awscli2@2.34.24"
], ],
"shell": { "shell": {
"init_hook": [ "init_hook": [

View File

@@ -2,6 +2,12 @@
## Table of Contents ## Table of Contents
- [Overview](#overview) - [Overview](#overview)
- [Infrastructure Provisioning (OpenTofu)](#infrastructure-provisioning-opentofu)
- [Prerequisites](#provisioning-prerequisites)
- [Provisioning a Cluster](#provisioning-a-cluster)
- [Tearing Down a Cluster](#tearing-down-a-cluster)
- [Retrieving Kubeconfig](#retrieving-kubeconfig)
- [Platform Credentials](#platform-credentials)
- [Cluster Bootstrap](#cluster-bootstrap) - [Cluster Bootstrap](#cluster-bootstrap)
- [Initial Cluster Setup](#initial-cluster-setup) - [Initial Cluster Setup](#initial-cluster-setup)
- [ArgoCD Repository Access Setup](#argocd-repository-access-setup) - [ArgoCD Repository Access Setup](#argocd-repository-access-setup)
@@ -29,6 +35,120 @@ This runbook provides operational procedures for maintaining the Kubernetes clus
--- ---
## Infrastructure Provisioning (OpenTofu)
The `.tofu/` directory contains multi-cloud Kubernetes infrastructure-as-code using [OpenTofu](https://opentofu.org/). It provisions clusters on four cloud platforms (AKS, EKS, GKE, UpCloud), each with three environment tiers: **dev**, **prod**, and **workload**.
### Provisioning Prerequisites {#provisioning-prerequisites}
- **OpenTofu** (`tofu`) installed
- **kubectl** installed
- **helm** installed
- **yq** (optional — loads cluster config from `clusters/<cluster>.yaml`)
- Platform CLI tools:
- **AKS**: `az` (Azure CLI)
- **EKS**: `aws` (AWS CLI)
- **GKE**: `gcloud` (Google Cloud SDK)
- **UPC**: `upctl` (UpCloud CLI)
### Provisioning a Cluster
```bash
# Navigate to the scripts directory
cd .tofu/scripts
# 1. Copy and fill in credentials for your platform
cp ../configs/aks.env.example ../configs/aks.env
# Edit ../configs/aks.env with your credentials
# 2. Provision cluster (interactive — prompts before applying)
./setup-cluster.sh aks-dev
# 3. Dry-run only (plan without applying)
./setup-cluster.sh aks-dev --plan
# 4. Non-interactive (skip confirmations)
./setup-cluster.sh aks-dev --auto
```
**Cluster name format**: `<platform>-<env>` — e.g., `aks-dev`, `eks-prod`, `gke-workload`, `upc-dev`
**What `setup-cluster.sh` does**:
1. Validates cluster name, extracts platform and environment
2. Checks prerequisites (tofu, kubectl, helm)
3. Loads credentials from `configs/<platform>.env`
4. Optionally loads cluster config from `clusters/<cluster>.yaml` (via yq)
5. Runs `tofu init``tofu plan` → prompts → `tofu apply`
6. Fetches and caches kubeconfig to `private/<cluster>/kubeconfig`
7. Waits for all nodes to reach Ready state (300s timeout)
8. Outputs next steps: `export KUBECONFIG` + `./bootstrap.sh`
### Tearing Down a Cluster
```bash
# Destroy cluster infrastructure
./teardown-cluster.sh aks-dev
# Equivalent to:
./setup-cluster.sh aks-dev --destroy
```
### Retrieving Kubeconfig
```bash
# Get kubeconfig for an existing cluster (uses cache or platform CLI)
./get-kubeconfig.sh aks-dev
# Cached kubeconfigs stored in: private/<cluster>/kubeconfig
```
Platform-specific retrieval fallbacks:
- **AKS**: `az aks get-credentials`
- **EKS**: `aws eks update-kubeconfig`
- **GKE**: `gcloud container clusters get-credentials`
- **UPC**: `upctl kubernetes config`
### Platform Credentials
Each platform has a `configs/<platform>.env.example` template. Copy to `.env` and populate:
| Platform | Required Variables | Optional |
|----------|--------------------|----------|
| **AKS** | `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID` | `ARM_RESOURCE_GROUP` (defaults to cluster name) |
| **EKS** | `AWS_PROFILE` (default: "default"), `AWS_REGION` (default: "eu-west-1") | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
| **GKE** | `GCP_PROJECT_ID`, `GCP_REGION` (default: "europe-west4") | `GOOGLE_APPLICATION_CREDENTIALS` (SA JSON path) |
| **UPC** | `UPCLOUD_TOKEN` | `UPCLOUD_CLUSTER_ID` (set after creation) |
> **Note**: `.env` files are git-ignored. Never commit credentials.
### End-to-End Workflow
Full cluster lifecycle: provision → bootstrap → operate → teardown:
```bash
# 1. Provision infrastructure
cd .tofu/scripts
./setup-cluster.sh aks-dev
# 2. Export kubeconfig (printed by setup-cluster.sh)
export KUBECONFIG=$(pwd)/../../private/aks-dev/kubeconfig
# 3. Bootstrap GitOps (ArgoCD + App-of-Apps)
cd ../..
./bootstrap.sh aks-dev
# 4. Verify
kubectl get applications -n argocd
# ... operate cluster ...
# 5. Teardown when done
cd .tofu/scripts
./teardown-cluster.sh aks-dev
```
---
## Cluster Bootstrap ## Cluster Bootstrap
### Initial Cluster Setup ### Initial Cluster Setup
@@ -37,7 +157,7 @@ Bootstrap a new cluster from scratch:
#### Prerequisites #### Prerequisites
1. **Kubernetes cluster running** (UpCloud, AWS EKS, Azure AKS, GCP GKE, or any K8s cluster) 1. **Kubernetes cluster running** (provisioned via `.tofu/scripts/setup-cluster.sh` or manually on UpCloud, AWS EKS, Azure AKS, GCP GKE)
2. **kubectl configured** with admin access 2. **kubectl configured** with admin access
3. **Repositories cloned** locally 3. **Repositories cloned** locally
@@ -1286,14 +1406,17 @@ spec:
```bash ```bash
# 1. Provision new Kubernetes cluster # 1. Provision new Kubernetes cluster
cd .tofu/scripts
./setup-cluster.sh upc-dev # or aks-dev, eks-prod, etc.
export KUBECONFIG=$(pwd)/../../private/upc-dev/kubeconfig
# 2. Configure kubectl # 2. Verify cluster access
kubectl config use-context new-cluster
kubectl cluster-info kubectl cluster-info
kubectl get nodes
# 3. Bootstrap cluster # 3. Bootstrap cluster
cd ~/dev/k8s/launchpad cd ../..
./bootstrap.sh ./bootstrap.sh upc-dev
# 4. Wait for ArgoCD to sync all applications # 4. Wait for ArgoCD to sync all applications
kubectl get applications -n argocd -w kubectl get applications -n argocd -w

View File

@@ -3,6 +3,7 @@
## Table of Contents ## Table of Contents
- [Architecture Components](#architecture-components) - [Architecture Components](#architecture-components)
- [Repository Reference](#repository-reference) - [Repository Reference](#repository-reference)
- [OpenTofu Infrastructure Reference](#opentofu-infrastructure-reference)
- [Helm Chart Reference](#helm-chart-reference) - [Helm Chart Reference](#helm-chart-reference)
- [ArgoCD Configuration](#argocd-configuration) - [ArgoCD Configuration](#argocd-configuration)
- [Infrastructure Components](#infrastructure-components) - [Infrastructure Components](#infrastructure-components)
@@ -207,6 +208,196 @@ launchpad/
└── REFERENCE.md └── REFERENCE.md
``` ```
---
## OpenTofu Infrastructure Reference
The `.tofu/` directory provides multi-cloud Kubernetes cluster provisioning using OpenTofu.
### Directory Structure
```
.tofu/
├── configs/ # Platform credential templates (git-ignored .env files)
│ ├── aks.env.example
│ ├── eks.env.example
│ ├── gke.env.example
│ └── upc.env.example
├── platforms/ # OpenTofu modules per cloud provider
│ ├── aks/ # Azure AKS
│ │ ├── modules/cluster/ # Reusable AKS module
│ │ │ ├── main.tf # Resource group, VNet, subnet, AKS cluster
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── providers.tf
│ │ ├── dev/ # Dev environment root
│ │ ├── prod/ # Prod environment root
│ │ └── workload/ # Workload cluster (+ external-dns identity)
│ ├── eks/ # AWS EKS (same structure)
│ ├── gke/ # GCP GKE
│ └── upc/ # UpCloud Kubernetes
└── scripts/
├── setup-cluster.sh # Provision cluster
├── teardown-cluster.sh # Destroy cluster
└── get-kubeconfig.sh # Retrieve/cache kubeconfig
```
### Three-Tier Cluster Strategy
Each platform defines three environment tiers:
| Tier | Purpose | Typical Sizing | Notes |
|------|---------|---------------|-------|
| **dev** | Development/testing | Small, economical nodes (2 nodes) | No delete locks, minimal HA |
| **prod** | Production workloads | Larger nodes, multiple AZs (3 nodes) | Delete locks, HA networking |
| **workload** | Application-only cluster | Medium nodes (2 nodes) | Includes external-DNS integration, no platform services |
### Platform Specifications
#### AKS (Azure Kubernetes Service)
| Resource | Description |
|----------|-------------|
| `azurerm_resource_group` | Container for all Azure resources |
| `azurerm_management_lock` | Optional CanNotDelete lock (prod) |
| `azurerm_virtual_network` | VPC, default `10.100.0.0/16` |
| `azurerm_subnet` | Node subnet, default `10.100.0.0/22` |
| `azurerm_kubernetes_cluster` | AKS with Azure CNI, OIDC issuer, Workload Identity |
**Dev**: Standard_B2s, 2 nodes, norwayeast, no delete lock
**Prod**: Standard_D4s_v3, 3 nodes, westeurope, delete lock enabled
**Workload**: Adds `azurerm_user_assigned_identity` + federated credential for external-dns with DNS Zone Contributor role
**Variables** (`modules/cluster/variables.tf`):
- `prefix` — resource name prefix
- `location` — Azure region
- `vnet_address_space` — default `10.100.0.0/16`
- `aks_subnet_cidr` — default `10.100.0.0/22`
- `aks_node_vm_size` — VM size (e.g., `Standard_B2s`)
- `aks_node_count` — number of nodes
- `aks_kubernetes_version``null` = latest
- `enable_delete_lock` — default `false`
#### EKS (Amazon Elastic Kubernetes Service)
| Resource | Description |
|----------|-------------|
| `aws_vpc` | VPC with DNS enabled, default `10.100.0.0/16` |
| `aws_subnet` (public) | Per-AZ, tagged `kubernetes.io/role/elb=1` |
| `aws_subnet` (private) | Per-AZ, tagged `kubernetes.io/role/internal-elb=1` |
| `aws_nat_gateway` | Single NAT (dev); prod should use one per AZ |
| `aws_eks_cluster` | EKS with public+private endpoints, OIDC issuer |
| `aws_iam_openid_connect_provider` | IRSA (IAM Roles for Service Accounts) |
| `aws_eks_node_group` | Managed nodes with auto-scaling |
**Dev**: t3.medium, 2 nodes (min 1, max 4), eu-west-1a/b, K8s 1.30
**Prod**: m5.xlarge, 3 nodes (min 3, max 6), eu-west-1a/b/c
**Workload**: Adds IRSA role for external-dns with Route53 permissions (ChangeResourceRecordSets, ListHostedZones, ListResourceRecordSets, ListTagsForResource)
**Variables**:
- `region` — AWS region
- `vpc_cidr` — default `10.100.0.0/16`
- `availability_zones` — list of AZs (23 recommended)
- `node_instance_type`, `node_count`, `node_min_count`, `node_max_count`
- `kubernetes_version` — default `1.30`
#### GKE (Google Kubernetes Engine)
| Resource | Description |
|----------|-------------|
| `google_project_service` | Enables compute and container APIs |
| `google_compute_network` | Custom VPC (no auto subnets) |
| `google_compute_subnetwork` | Primary `10.100.0.0/22`, pods `10.200.0.0/14`, services `10.204.0.0/20` |
| `google_container_cluster` | Regional cluster, VPC-native, Workload Identity |
| `google_container_node_pool` | Auto-repair, auto-upgrade, GKE_METADATA mode |
**Dev**: e2-standard-2, 2 nodes/zone, no deletion protection
**Prod**: e2-standard-4, 3 nodes/zone, deletion protection enabled
**Workload**: Adds Google SA for external-dns with `dns.admin` role + Workload Identity binding
**Variables**:
- `project_id` — GCP project (required)
- `region` — GCP region
- `node_machine_type`, `node_count`
- `kubernetes_version``null` = STABLE release channel
- `deletion_protection` — default `false`
#### UPC (UpCloud Kubernetes)
| Resource | Description |
|----------|-------------|
| `upcloud_router` | Private router for cluster network |
| `upcloud_gateway` | NAT gateway for outbound internet |
| `upcloud_network` | Private network, DHCP, default `10.100.0.0/24` |
| `upcloud_kubernetes_cluster` | Managed K8s, private node groups |
| `upcloud_kubernetes_node_group` | Anti-affinity if node_count > 1 |
**Dev**: DEV-1xCPU-2GB, 2 nodes, no-svg1
**Prod**: 4xCPU-8GB, 3 nodes, de-fra1
**Workload**: 2xCPU-4GB, 2 nodes, fi-hel1, CIDR `10.110.0.0/24`
> **Note**: UpCloud has no native workload identity — external-DNS integration not available.
### Workload Identity & External-DNS
Workload clusters include keyless cloud access for external-DNS:
| Platform | Identity Mechanism | DNS Permissions |
|----------|--------------------|-----------------|
| **AKS** | Azure Workload Identity (federated credential) | DNS Zone Contributor |
| **EKS** | IRSA (OIDC federation) | Route53 ChangeResourceRecordSets, ListHostedZones |
| **GKE** | Workload Identity (K8s SA → Google SA) | dns.admin role |
| **UPC** | N/A | N/A |
### Naming Conventions
- Cluster: `<prefix>-aks` / `-eks` / `-gke` (derived from platform)
- Resource groups: `<prefix>-rg` (Azure only)
- VPCs/Networks: `<prefix>-vpc`
- Node groups: `<prefix>-nodes`
- Dev prefix: `clst-dev`, Prod prefix: `clst`, Workload prefix: `clst-workload`
### Provider Authentication
| Platform | Auth Method | Config Source |
|----------|-------------|---------------|
| **AKS** | Azure CLI or env vars (`ARM_SUBSCRIPTION_ID`, `ARM_TENANT_ID`) | `configs/aks.env` |
| **EKS** | AWS CLI profile or explicit credentials | `configs/eks.env` |
| **GKE** | Application Default Credentials or SA JSON | `configs/gke.env` |
| **UPC** | API token (`UPCLOUD_TOKEN`) | `configs/upc.env` |
### Scripts Reference
#### `setup-cluster.sh`
```bash
./setup-cluster.sh <platform>-<env> [--plan] [--destroy] [--auto]
```
| Flag | Effect |
|------|--------|
| (none) | Interactive: plan → prompt → apply |
| `--plan` | Dry-run only (tofu plan) |
| `--destroy` | Destroy infrastructure |
| `--auto` | Skip confirmation prompts |
#### `teardown-cluster.sh`
```bash
./teardown-cluster.sh <platform>-<env>
# Delegates to: setup-cluster.sh "$@" --destroy
```
#### `get-kubeconfig.sh`
```bash
./get-kubeconfig.sh <platform>-<env>
# Checks cache: private/<cluster>/kubeconfig
# Falls back to platform CLI if no cache
```
---
#### Key Files #### Key Files
**`bootstrap.sh`** **`bootstrap.sh`**