tofu+tools
This commit is contained in:
@@ -3,6 +3,7 @@
|
||||
## Table of Contents
|
||||
- [Architecture Components](#architecture-components)
|
||||
- [Repository Reference](#repository-reference)
|
||||
- [OpenTofu Infrastructure Reference](#opentofu-infrastructure-reference)
|
||||
- [Helm Chart Reference](#helm-chart-reference)
|
||||
- [ArgoCD Configuration](#argocd-configuration)
|
||||
- [Infrastructure Components](#infrastructure-components)
|
||||
@@ -207,6 +208,196 @@ launchpad/
|
||||
└── REFERENCE.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OpenTofu Infrastructure Reference
|
||||
|
||||
The `.tofu/` directory provides multi-cloud Kubernetes cluster provisioning using OpenTofu.
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
.tofu/
|
||||
├── configs/ # Platform credential templates (git-ignored .env files)
|
||||
│ ├── aks.env.example
|
||||
│ ├── eks.env.example
|
||||
│ ├── gke.env.example
|
||||
│ └── upc.env.example
|
||||
├── platforms/ # OpenTofu modules per cloud provider
|
||||
│ ├── aks/ # Azure AKS
|
||||
│ │ ├── modules/cluster/ # Reusable AKS module
|
||||
│ │ │ ├── main.tf # Resource group, VNet, subnet, AKS cluster
|
||||
│ │ │ ├── variables.tf
|
||||
│ │ │ ├── outputs.tf
|
||||
│ │ │ └── providers.tf
|
||||
│ │ ├── dev/ # Dev environment root
|
||||
│ │ ├── prod/ # Prod environment root
|
||||
│ │ └── workload/ # Workload cluster (+ external-dns identity)
|
||||
│ ├── eks/ # AWS EKS (same structure)
|
||||
│ ├── gke/ # GCP GKE
|
||||
│ └── upc/ # UpCloud Kubernetes
|
||||
└── scripts/
|
||||
├── setup-cluster.sh # Provision cluster
|
||||
├── teardown-cluster.sh # Destroy cluster
|
||||
└── get-kubeconfig.sh # Retrieve/cache kubeconfig
|
||||
```
|
||||
|
||||
### Three-Tier Cluster Strategy
|
||||
|
||||
Each platform defines three environment tiers:
|
||||
|
||||
| Tier | Purpose | Typical Sizing | Notes |
|
||||
|------|---------|---------------|-------|
|
||||
| **dev** | Development/testing | Small, economical nodes (2 nodes) | No delete locks, minimal HA |
|
||||
| **prod** | Production workloads | Larger nodes, multiple AZs (3 nodes) | Delete locks, HA networking |
|
||||
| **workload** | Application-only cluster | Medium nodes (2 nodes) | Includes external-DNS integration, no platform services |
|
||||
|
||||
### Platform Specifications
|
||||
|
||||
#### AKS (Azure Kubernetes Service)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `azurerm_resource_group` | Container for all Azure resources |
|
||||
| `azurerm_management_lock` | Optional CanNotDelete lock (prod) |
|
||||
| `azurerm_virtual_network` | VPC, default `10.100.0.0/16` |
|
||||
| `azurerm_subnet` | Node subnet, default `10.100.0.0/22` |
|
||||
| `azurerm_kubernetes_cluster` | AKS with Azure CNI, OIDC issuer, Workload Identity |
|
||||
|
||||
**Dev**: Standard_B2s, 2 nodes, norwayeast, no delete lock
|
||||
**Prod**: Standard_D4s_v3, 3 nodes, westeurope, delete lock enabled
|
||||
**Workload**: Adds `azurerm_user_assigned_identity` + federated credential for external-dns with DNS Zone Contributor role
|
||||
|
||||
**Variables** (`modules/cluster/variables.tf`):
|
||||
- `prefix` — resource name prefix
|
||||
- `location` — Azure region
|
||||
- `vnet_address_space` — default `10.100.0.0/16`
|
||||
- `aks_subnet_cidr` — default `10.100.0.0/22`
|
||||
- `aks_node_vm_size` — VM size (e.g., `Standard_B2s`)
|
||||
- `aks_node_count` — number of nodes
|
||||
- `aks_kubernetes_version` — `null` = latest
|
||||
- `enable_delete_lock` — default `false`
|
||||
|
||||
#### EKS (Amazon Elastic Kubernetes Service)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `aws_vpc` | VPC with DNS enabled, default `10.100.0.0/16` |
|
||||
| `aws_subnet` (public) | Per-AZ, tagged `kubernetes.io/role/elb=1` |
|
||||
| `aws_subnet` (private) | Per-AZ, tagged `kubernetes.io/role/internal-elb=1` |
|
||||
| `aws_nat_gateway` | Single NAT (dev); prod should use one per AZ |
|
||||
| `aws_eks_cluster` | EKS with public+private endpoints, OIDC issuer |
|
||||
| `aws_iam_openid_connect_provider` | IRSA (IAM Roles for Service Accounts) |
|
||||
| `aws_eks_node_group` | Managed nodes with auto-scaling |
|
||||
|
||||
**Dev**: t3.medium, 2 nodes (min 1, max 4), eu-west-1a/b, K8s 1.30
|
||||
**Prod**: m5.xlarge, 3 nodes (min 3, max 6), eu-west-1a/b/c
|
||||
**Workload**: Adds IRSA role for external-dns with Route53 permissions (ChangeResourceRecordSets, ListHostedZones, ListResourceRecordSets, ListTagsForResource)
|
||||
|
||||
**Variables**:
|
||||
- `region` — AWS region
|
||||
- `vpc_cidr` — default `10.100.0.0/16`
|
||||
- `availability_zones` — list of AZs (2–3 recommended)
|
||||
- `node_instance_type`, `node_count`, `node_min_count`, `node_max_count`
|
||||
- `kubernetes_version` — default `1.30`
|
||||
|
||||
#### GKE (Google Kubernetes Engine)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `google_project_service` | Enables compute and container APIs |
|
||||
| `google_compute_network` | Custom VPC (no auto subnets) |
|
||||
| `google_compute_subnetwork` | Primary `10.100.0.0/22`, pods `10.200.0.0/14`, services `10.204.0.0/20` |
|
||||
| `google_container_cluster` | Regional cluster, VPC-native, Workload Identity |
|
||||
| `google_container_node_pool` | Auto-repair, auto-upgrade, GKE_METADATA mode |
|
||||
|
||||
**Dev**: e2-standard-2, 2 nodes/zone, no deletion protection
|
||||
**Prod**: e2-standard-4, 3 nodes/zone, deletion protection enabled
|
||||
**Workload**: Adds Google SA for external-dns with `dns.admin` role + Workload Identity binding
|
||||
|
||||
**Variables**:
|
||||
- `project_id` — GCP project (required)
|
||||
- `region` — GCP region
|
||||
- `node_machine_type`, `node_count`
|
||||
- `kubernetes_version` — `null` = STABLE release channel
|
||||
- `deletion_protection` — default `false`
|
||||
|
||||
#### UPC (UpCloud Kubernetes)
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `upcloud_router` | Private router for cluster network |
|
||||
| `upcloud_gateway` | NAT gateway for outbound internet |
|
||||
| `upcloud_network` | Private network, DHCP, default `10.100.0.0/24` |
|
||||
| `upcloud_kubernetes_cluster` | Managed K8s, private node groups |
|
||||
| `upcloud_kubernetes_node_group` | Anti-affinity if node_count > 1 |
|
||||
|
||||
**Dev**: DEV-1xCPU-2GB, 2 nodes, no-svg1
|
||||
**Prod**: 4xCPU-8GB, 3 nodes, de-fra1
|
||||
**Workload**: 2xCPU-4GB, 2 nodes, fi-hel1, CIDR `10.110.0.0/24`
|
||||
|
||||
> **Note**: UpCloud has no native workload identity — external-DNS integration not available.
|
||||
|
||||
### Workload Identity & External-DNS
|
||||
|
||||
Workload clusters include keyless cloud access for external-DNS:
|
||||
|
||||
| Platform | Identity Mechanism | DNS Permissions |
|
||||
|----------|--------------------|-----------------|
|
||||
| **AKS** | Azure Workload Identity (federated credential) | DNS Zone Contributor |
|
||||
| **EKS** | IRSA (OIDC federation) | Route53 ChangeResourceRecordSets, ListHostedZones |
|
||||
| **GKE** | Workload Identity (K8s SA → Google SA) | dns.admin role |
|
||||
| **UPC** | N/A | N/A |
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
- Cluster: `<prefix>-aks` / `-eks` / `-gke` (derived from platform)
|
||||
- Resource groups: `<prefix>-rg` (Azure only)
|
||||
- VPCs/Networks: `<prefix>-vpc`
|
||||
- Node groups: `<prefix>-nodes`
|
||||
- Dev prefix: `clst-dev`, Prod prefix: `clst`, Workload prefix: `clst-workload`
|
||||
|
||||
### Provider Authentication
|
||||
|
||||
| Platform | Auth Method | Config Source |
|
||||
|----------|-------------|---------------|
|
||||
| **AKS** | Azure CLI or env vars (`ARM_SUBSCRIPTION_ID`, `ARM_TENANT_ID`) | `configs/aks.env` |
|
||||
| **EKS** | AWS CLI profile or explicit credentials | `configs/eks.env` |
|
||||
| **GKE** | Application Default Credentials or SA JSON | `configs/gke.env` |
|
||||
| **UPC** | API token (`UPCLOUD_TOKEN`) | `configs/upc.env` |
|
||||
|
||||
### Scripts Reference
|
||||
|
||||
#### `setup-cluster.sh`
|
||||
|
||||
```bash
|
||||
./setup-cluster.sh <platform>-<env> [--plan] [--destroy] [--auto]
|
||||
```
|
||||
|
||||
| Flag | Effect |
|
||||
|------|--------|
|
||||
| (none) | Interactive: plan → prompt → apply |
|
||||
| `--plan` | Dry-run only (tofu plan) |
|
||||
| `--destroy` | Destroy infrastructure |
|
||||
| `--auto` | Skip confirmation prompts |
|
||||
|
||||
#### `teardown-cluster.sh`
|
||||
|
||||
```bash
|
||||
./teardown-cluster.sh <platform>-<env>
|
||||
# Delegates to: setup-cluster.sh "$@" --destroy
|
||||
```
|
||||
|
||||
#### `get-kubeconfig.sh`
|
||||
|
||||
```bash
|
||||
./get-kubeconfig.sh <platform>-<env>
|
||||
# Checks cache: private/<cluster>/kubeconfig
|
||||
# Falls back to platform CLI if no cache
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Key Files
|
||||
|
||||
**`bootstrap.sh`**
|
||||
|
||||
Reference in New Issue
Block a user