tofu+tools
This commit is contained in:
@@ -2,6 +2,12 @@
|
||||
|
||||
## Table of Contents
|
||||
- [Overview](#overview)
|
||||
- [Infrastructure Provisioning (OpenTofu)](#infrastructure-provisioning-opentofu)
|
||||
- [Prerequisites](#provisioning-prerequisites)
|
||||
- [Provisioning a Cluster](#provisioning-a-cluster)
|
||||
- [Tearing Down a Cluster](#tearing-down-a-cluster)
|
||||
- [Retrieving Kubeconfig](#retrieving-kubeconfig)
|
||||
- [Platform Credentials](#platform-credentials)
|
||||
- [Cluster Bootstrap](#cluster-bootstrap)
|
||||
- [Initial Cluster Setup](#initial-cluster-setup)
|
||||
- [ArgoCD Repository Access Setup](#argocd-repository-access-setup)
|
||||
@@ -29,6 +35,120 @@ This runbook provides operational procedures for maintaining the Kubernetes clus
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Provisioning (OpenTofu)
|
||||
|
||||
The `.tofu/` directory contains multi-cloud Kubernetes infrastructure-as-code using [OpenTofu](https://opentofu.org/). It provisions clusters on four cloud platforms (AKS, EKS, GKE, UpCloud), each with three environment tiers: **dev**, **prod**, and **workload**.
|
||||
|
||||
### Provisioning Prerequisites {#provisioning-prerequisites}
|
||||
|
||||
- **OpenTofu** (`tofu`) installed
|
||||
- **kubectl** installed
|
||||
- **helm** installed
|
||||
- **yq** (optional — loads cluster config from `clusters/<cluster>.yaml`)
|
||||
- Platform CLI tools:
|
||||
- **AKS**: `az` (Azure CLI)
|
||||
- **EKS**: `aws` (AWS CLI)
|
||||
- **GKE**: `gcloud` (Google Cloud SDK)
|
||||
- **UPC**: `upctl` (UpCloud CLI)
|
||||
|
||||
### Provisioning a Cluster
|
||||
|
||||
```bash
|
||||
# Navigate to the scripts directory
|
||||
cd .tofu/scripts
|
||||
|
||||
# 1. Copy and fill in credentials for your platform
|
||||
cp ../configs/aks.env.example ../configs/aks.env
|
||||
# Edit ../configs/aks.env with your credentials
|
||||
|
||||
# 2. Provision cluster (interactive — prompts before applying)
|
||||
./setup-cluster.sh aks-dev
|
||||
|
||||
# 3. Dry-run only (plan without applying)
|
||||
./setup-cluster.sh aks-dev --plan
|
||||
|
||||
# 4. Non-interactive (skip confirmations)
|
||||
./setup-cluster.sh aks-dev --auto
|
||||
```
|
||||
|
||||
**Cluster name format**: `<platform>-<env>` — e.g., `aks-dev`, `eks-prod`, `gke-workload`, `upc-dev`
|
||||
|
||||
**What `setup-cluster.sh` does**:
|
||||
1. Validates cluster name, extracts platform and environment
|
||||
2. Checks prerequisites (tofu, kubectl, helm)
|
||||
3. Loads credentials from `configs/<platform>.env`
|
||||
4. Optionally loads cluster config from `clusters/<cluster>.yaml` (via yq)
|
||||
5. Runs `tofu init` → `tofu plan` → prompts → `tofu apply`
|
||||
6. Fetches and caches kubeconfig to `private/<cluster>/kubeconfig`
|
||||
7. Waits for all nodes to reach Ready state (300s timeout)
|
||||
8. Outputs next steps: `export KUBECONFIG` + `./bootstrap.sh`
|
||||
|
||||
### Tearing Down a Cluster
|
||||
|
||||
```bash
|
||||
# Destroy cluster infrastructure
|
||||
./teardown-cluster.sh aks-dev
|
||||
|
||||
# Equivalent to:
|
||||
./setup-cluster.sh aks-dev --destroy
|
||||
```
|
||||
|
||||
### Retrieving Kubeconfig
|
||||
|
||||
```bash
|
||||
# Get kubeconfig for an existing cluster (uses cache or platform CLI)
|
||||
./get-kubeconfig.sh aks-dev
|
||||
|
||||
# Cached kubeconfigs stored in: private/<cluster>/kubeconfig
|
||||
```
|
||||
|
||||
Platform-specific retrieval fallbacks:
|
||||
- **AKS**: `az aks get-credentials`
|
||||
- **EKS**: `aws eks update-kubeconfig`
|
||||
- **GKE**: `gcloud container clusters get-credentials`
|
||||
- **UPC**: `upctl kubernetes config`
|
||||
|
||||
### Platform Credentials
|
||||
|
||||
Each platform has a `configs/<platform>.env.example` template. Copy to `.env` and populate:
|
||||
|
||||
| Platform | Required Variables | Optional |
|
||||
|----------|--------------------|----------|
|
||||
| **AKS** | `AZURE_TENANT_ID`, `AZURE_SUBSCRIPTION_ID` | `ARM_RESOURCE_GROUP` (defaults to cluster name) |
|
||||
| **EKS** | `AWS_PROFILE` (default: "default"), `AWS_REGION` (default: "eu-west-1") | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
|
||||
| **GKE** | `GCP_PROJECT_ID`, `GCP_REGION` (default: "europe-west4") | `GOOGLE_APPLICATION_CREDENTIALS` (SA JSON path) |
|
||||
| **UPC** | `UPCLOUD_TOKEN` | `UPCLOUD_CLUSTER_ID` (set after creation) |
|
||||
|
||||
> **Note**: `.env` files are git-ignored. Never commit credentials.
|
||||
|
||||
### End-to-End Workflow
|
||||
|
||||
Full cluster lifecycle: provision → bootstrap → operate → teardown:
|
||||
|
||||
```bash
|
||||
# 1. Provision infrastructure
|
||||
cd .tofu/scripts
|
||||
./setup-cluster.sh aks-dev
|
||||
|
||||
# 2. Export kubeconfig (printed by setup-cluster.sh)
|
||||
export KUBECONFIG=$(pwd)/../../private/aks-dev/kubeconfig
|
||||
|
||||
# 3. Bootstrap GitOps (ArgoCD + App-of-Apps)
|
||||
cd ../..
|
||||
./bootstrap.sh aks-dev
|
||||
|
||||
# 4. Verify
|
||||
kubectl get applications -n argocd
|
||||
|
||||
# ... operate cluster ...
|
||||
|
||||
# 5. Teardown when done
|
||||
cd .tofu/scripts
|
||||
./teardown-cluster.sh aks-dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cluster Bootstrap
|
||||
|
||||
### Initial Cluster Setup
|
||||
@@ -37,7 +157,7 @@ Bootstrap a new cluster from scratch:
|
||||
|
||||
#### Prerequisites
|
||||
|
||||
1. **Kubernetes cluster running** (UpCloud, AWS EKS, Azure AKS, GCP GKE, or any K8s cluster)
|
||||
1. **Kubernetes cluster running** (provisioned via `.tofu/scripts/setup-cluster.sh` or manually on UpCloud, AWS EKS, Azure AKS, GCP GKE)
|
||||
2. **kubectl configured** with admin access
|
||||
3. **Repositories cloned** locally
|
||||
|
||||
@@ -1286,14 +1406,17 @@ spec:
|
||||
|
||||
```bash
|
||||
# 1. Provision new Kubernetes cluster
|
||||
cd .tofu/scripts
|
||||
./setup-cluster.sh upc-dev # or aks-dev, eks-prod, etc.
|
||||
export KUBECONFIG=$(pwd)/../../private/upc-dev/kubeconfig
|
||||
|
||||
# 2. Configure kubectl
|
||||
kubectl config use-context new-cluster
|
||||
# 2. Verify cluster access
|
||||
kubectl cluster-info
|
||||
kubectl get nodes
|
||||
|
||||
# 3. Bootstrap cluster
|
||||
cd ~/dev/k8s/launchpad
|
||||
./bootstrap.sh
|
||||
cd ../..
|
||||
./bootstrap.sh upc-dev
|
||||
|
||||
# 4. Wait for ArgoCD to sync all applications
|
||||
kubectl get applications -n argocd -w
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
## Table of Contents
|
||||
- [Architecture Components](#architecture-components)
|
||||
- [Repository Reference](#repository-reference)
|
||||
- [OpenTofu Infrastructure Reference](#opentofu-infrastructure-reference)
|
||||
- [Helm Chart Reference](#helm-chart-reference)
|
||||
- [ArgoCD Configuration](#argocd-configuration)
|
||||
- [Infrastructure Components](#infrastructure-components)
|
||||
@@ -207,6 +208,196 @@ launchpad/
|
||||
└── REFERENCE.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OpenTofu Infrastructure Reference
|
||||
|
||||
The `.tofu/` directory provides multi-cloud Kubernetes cluster provisioning using OpenTofu.
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
.tofu/
|
||||
├── configs/ # Platform credential templates (git-ignored .env files)
|
||||
│ ├── aks.env.example
|
||||
│ ├── eks.env.example
|
||||
│ ├── gke.env.example
|
||||
│ └── upc.env.example
|
||||
├── platforms/ # OpenTofu modules per cloud provider
|
||||
│ ├── aks/ # Azure AKS
|
||||
│ │ ├── modules/cluster/ # Reusable AKS module
|
||||
│ │ │ ├── main.tf # Resource group, VNet, subnet, AKS cluster
|
||||
│ │ │ ├── variables.tf
|
||||
│ │ │ ├── outputs.tf
|
||||
│ │ │ └── providers.tf
|
||||
│ │ ├── dev/ # Dev environment root
|
||||
│ │ ├── prod/ # Prod environment root
|
||||
│ │ └── workload/ # Workload cluster (+ external-dns identity)
|
||||
│ ├── eks/ # AWS EKS (same structure)
|
||||
│ ├── gke/ # GCP GKE
|
||||
│ └── upc/ # UpCloud Kubernetes
|
||||
└── scripts/
|
||||
├── setup-cluster.sh # Provision cluster
|
||||
├── teardown-cluster.sh # Destroy cluster
|
||||
└── get-kubeconfig.sh # Retrieve/cache kubeconfig
|
||||
```
|
||||
|
||||
### Three-Tier Cluster Strategy
|
||||
|
||||
Each platform defines three environment tiers:
|
||||
|
||||
| Tier | Purpose | Typical Sizing | Notes |
|
||||
|------|---------|---------------|-------|
|
||||
| **dev** | Development/testing | Small, economical nodes (2 nodes) | No delete locks, minimal HA |
|
||||
| **prod** | Production workloads | Larger nodes, multiple AZs (3 nodes) | Delete locks, HA networking |
|
||||
| **workload** | Application-only cluster | Medium nodes (2 nodes) | Includes external-DNS integration, no platform services |
|
||||
|
||||
### Platform Specifications
|
||||
|
||||
#### AKS (Azure Kubernetes Service)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `azurerm_resource_group` | Container for all Azure resources |
|
||||
| `azurerm_management_lock` | Optional CanNotDelete lock (prod) |
|
||||
| `azurerm_virtual_network` | VPC, default `10.100.0.0/16` |
|
||||
| `azurerm_subnet` | Node subnet, default `10.100.0.0/22` |
|
||||
| `azurerm_kubernetes_cluster` | AKS with Azure CNI, OIDC issuer, Workload Identity |
|
||||
|
||||
**Dev**: Standard_B2s, 2 nodes, norwayeast, no delete lock
|
||||
**Prod**: Standard_D4s_v3, 3 nodes, westeurope, delete lock enabled
|
||||
**Workload**: Adds `azurerm_user_assigned_identity` + federated credential for external-dns with DNS Zone Contributor role
|
||||
|
||||
**Variables** (`modules/cluster/variables.tf`):
|
||||
- `prefix` — resource name prefix
|
||||
- `location` — Azure region
|
||||
- `vnet_address_space` — default `10.100.0.0/16`
|
||||
- `aks_subnet_cidr` — default `10.100.0.0/22`
|
||||
- `aks_node_vm_size` — VM size (e.g., `Standard_B2s`)
|
||||
- `aks_node_count` — number of nodes
|
||||
- `aks_kubernetes_version` — `null` = latest
|
||||
- `enable_delete_lock` — default `false`
|
||||
|
||||
#### EKS (Amazon Elastic Kubernetes Service)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `aws_vpc` | VPC with DNS enabled, default `10.100.0.0/16` |
|
||||
| `aws_subnet` (public) | Per-AZ, tagged `kubernetes.io/role/elb=1` |
|
||||
| `aws_subnet` (private) | Per-AZ, tagged `kubernetes.io/role/internal-elb=1` |
|
||||
| `aws_nat_gateway` | Single NAT (dev); prod should use one per AZ |
|
||||
| `aws_eks_cluster` | EKS with public+private endpoints, OIDC issuer |
|
||||
| `aws_iam_openid_connect_provider` | IRSA (IAM Roles for Service Accounts) |
|
||||
| `aws_eks_node_group` | Managed nodes with auto-scaling |
|
||||
|
||||
**Dev**: t3.medium, 2 nodes (min 1, max 4), eu-west-1a/b, K8s 1.30
|
||||
**Prod**: m5.xlarge, 3 nodes (min 3, max 6), eu-west-1a/b/c
|
||||
**Workload**: Adds IRSA role for external-dns with Route53 permissions (ChangeResourceRecordSets, ListHostedZones, ListResourceRecordSets, ListTagsForResource)
|
||||
|
||||
**Variables**:
|
||||
- `region` — AWS region
|
||||
- `vpc_cidr` — default `10.100.0.0/16`
|
||||
- `availability_zones` — list of AZs (2–3 recommended)
|
||||
- `node_instance_type`, `node_count`, `node_min_count`, `node_max_count`
|
||||
- `kubernetes_version` — default `1.30`
|
||||
|
||||
#### GKE (Google Kubernetes Engine)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `google_project_service` | Enables compute and container APIs |
|
||||
| `google_compute_network` | Custom VPC (no auto subnets) |
|
||||
| `google_compute_subnetwork` | Primary `10.100.0.0/22`, pods `10.200.0.0/14`, services `10.204.0.0/20` |
|
||||
| `google_container_cluster` | Regional cluster, VPC-native, Workload Identity |
|
||||
| `google_container_node_pool` | Auto-repair, auto-upgrade, GKE_METADATA mode |
|
||||
|
||||
**Dev**: e2-standard-2, 2 nodes/zone, no deletion protection
|
||||
**Prod**: e2-standard-4, 3 nodes/zone, deletion protection enabled
|
||||
**Workload**: Adds Google SA for external-dns with `dns.admin` role + Workload Identity binding
|
||||
|
||||
**Variables**:
|
||||
- `project_id` — GCP project (required)
|
||||
- `region` — GCP region
|
||||
- `node_machine_type`, `node_count`
|
||||
- `kubernetes_version` — `null` = STABLE release channel
|
||||
- `deletion_protection` — default `false`
|
||||
|
||||
#### UPC (UpCloud Kubernetes)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `upcloud_router` | Private router for cluster network |
|
||||
| `upcloud_gateway` | NAT gateway for outbound internet |
|
||||
| `upcloud_network` | Private network, DHCP, default `10.100.0.0/24` |
|
||||
| `upcloud_kubernetes_cluster` | Managed K8s, private node groups |
|
||||
| `upcloud_kubernetes_node_group` | Anti-affinity if node_count > 1 |
|
||||
|
||||
**Dev**: DEV-1xCPU-2GB, 2 nodes, no-svg1
|
||||
**Prod**: 4xCPU-8GB, 3 nodes, de-fra1
|
||||
**Workload**: 2xCPU-4GB, 2 nodes, fi-hel1, CIDR `10.110.0.0/24`
|
||||
|
||||
> **Note**: UpCloud has no native workload identity — external-DNS integration not available.
|
||||
|
||||
### Workload Identity & External-DNS
|
||||
|
||||
Workload clusters include keyless cloud access for external-DNS:
|
||||
|
||||
| Platform | Identity Mechanism | DNS Permissions |
|
||||
|----------|--------------------|-----------------|
|
||||
| **AKS** | Azure Workload Identity (federated credential) | DNS Zone Contributor |
|
||||
| **EKS** | IRSA (OIDC federation) | Route53 ChangeResourceRecordSets, ListHostedZones |
|
||||
| **GKE** | Workload Identity (K8s SA → Google SA) | dns.admin role |
|
||||
| **UPC** | N/A | N/A |
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
- Cluster: `<prefix>-aks` / `-eks` / `-gke` (derived from platform)
|
||||
- Resource groups: `<prefix>-rg` (Azure only)
|
||||
- VPCs/Networks: `<prefix>-vpc`
|
||||
- Node groups: `<prefix>-nodes`
|
||||
- Dev prefix: `clst-dev`, Prod prefix: `clst`, Workload prefix: `clst-workload`
|
||||
|
||||
### Provider Authentication
|
||||
|
||||
| Platform | Auth Method | Config Source |
|
||||
|----------|-------------|---------------|
|
||||
| **AKS** | Azure CLI or env vars (`ARM_SUBSCRIPTION_ID`, `ARM_TENANT_ID`) | `configs/aks.env` |
|
||||
| **EKS** | AWS CLI profile or explicit credentials | `configs/eks.env` |
|
||||
| **GKE** | Application Default Credentials or SA JSON | `configs/gke.env` |
|
||||
| **UPC** | API token (`UPCLOUD_TOKEN`) | `configs/upc.env` |
|
||||
|
||||
### Scripts Reference
|
||||
|
||||
#### `setup-cluster.sh`
|
||||
|
||||
```bash
|
||||
./setup-cluster.sh <platform>-<env> [--plan] [--destroy] [--auto]
|
||||
```
|
||||
|
||||
| Flag | Effect |
|
||||
|------|--------|
|
||||
| (none) | Interactive: plan → prompt → apply |
|
||||
| `--plan` | Dry-run only (tofu plan) |
|
||||
| `--destroy` | Destroy infrastructure |
|
||||
| `--auto` | Skip confirmation prompts |
|
||||
|
||||
#### `teardown-cluster.sh`
|
||||
|
||||
```bash
|
||||
./teardown-cluster.sh <platform>-<env>
|
||||
# Delegates to: setup-cluster.sh "$@" --destroy
|
||||
```
|
||||
|
||||
#### `get-kubeconfig.sh`
|
||||
|
||||
```bash
|
||||
./get-kubeconfig.sh <platform>-<env>
|
||||
# Checks cache: private/<cluster>/kubeconfig
|
||||
# Falls back to platform CLI if no cache
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Key Files
|
||||
|
||||
**`bootstrap.sh`**
|
||||
|
||||
Reference in New Issue
Block a user