From f728f9dbd34033ecd1148d73c788860a54501233 Mon Sep 17 00:00:00 2001 From: Danijel Simeunovic Date: Fri, 20 Mar 2026 14:22:14 +0100 Subject: [PATCH] Tempo doc --- README.md | 9 ++++-- docs/GITOPS-ARCHITECTURE.md | 11 +++++-- docs/OPERATIONS-RUNBOOK.md | 27 ++++++++++++++++++ docs/REFERENCE.md | 57 +++++++++++++++++++++++++++++++++++++ 4 files changed, 98 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 95a6a0e..0d48f54 100644 --- a/README.md +++ b/README.md @@ -57,10 +57,10 @@ This repository contains the complete GitOps configuration for our Kubernetes cl ### What's Inside -- **Infrastructure Applications**: Traefik, Cert-Manager, Kyverno, Prometheus, Grafana, Loki, Sealed Secrets +- **Infrastructure Applications**: Traefik, Cert-Manager, Kyverno, Prometheus, Grafana, Loki, Tempo, Sealed Secrets - **Business Applications**: MCP10X, MusicMan, Dot-AI Stack, ArgoCD MCP - **Policies**: Kyverno security policies for secret management, namespace controls, pod verification -- **Monitoring**: Full observability stack with metrics, logs, and alerting +- **Monitoring**: Full observability stack with metrics, logs, traces, and alerting - **Secrets**: Sealed Secrets for secure Git storage ### Key Features @@ -72,7 +72,7 @@ This repository contains the complete GitOps configuration for our Kubernetes cl ✅ **Policy Enforcement**: Kyverno ensures security and compliance ✅ **Authentication**: Automatic sidecar injection (token & OIDC support) ✅ **TLS Everywhere**: Automatic Let's Encrypt certificates -✅ **Full Observability**: Prometheus, Grafana, Loki integration +✅ **Full Observability**: Prometheus, Grafana, Loki, Tempo integration --- @@ -91,6 +91,7 @@ This repository contains the complete GitOps configuration for our Kubernetes cl │ ├── prometheus.yaml │ ├── grafana.yaml │ ├── loki.yaml +│ ├── tempo.yaml │ ├── fluent-bit.yaml │ ├── trivy.yaml │ ├── sealedsecrets.yaml @@ -331,6 +332,7 @@ kubectl patch application myapp -n argocd \ | **Prometheus** | Metrics | `monitoring` | 1 | | **Grafana** | Dashboards | `monitoring` | 1 | | **Loki** | Logs | `monitoring` | 1 | +| **Tempo** | Distributed tracing | `monitoring` | 1 | | **Fluent-Bit** | Log shipping | `monitoring` | DaemonSet | | **Trivy** | Vulnerability scanning | `trivy-system` | 1 | @@ -470,6 +472,7 @@ Documentation lives in `docs/`. To update: - [Kyverno Documentation](https://kyverno.io/docs/) - [Traefik Documentation](https://doc.traefik.io/traefik/) - [Cert-Manager Documentation](https://cert-manager.io/docs/) +- [Grafana Tempo Documentation](https://grafana.com/docs/tempo/) - [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) ### Related Repositories diff --git a/docs/GITOPS-ARCHITECTURE.md b/docs/GITOPS-ARCHITECTURE.md index 1b81a88..2fa1794 100644 --- a/docs/GITOPS-ARCHITECTURE.md +++ b/docs/GITOPS-ARCHITECTURE.md @@ -21,7 +21,7 @@ This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where - **Deployment Pattern**: App-of-Apps - **Secret Management**: Sealed Secrets (kubeseal) - **Ingress**: Traefik with Let's Encrypt TLS -- **Monitoring**: Prometheus + Grafana + Loki + Fluent-Bit +- **Monitoring**: Prometheus + Grafana + Loki + Tempo + Fluent-Bit - **Policy Engine**: Kyverno - **Notifications**: Slack integration for sync status @@ -83,6 +83,7 @@ This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where │ │ - Prometheus │ │ │ │ - Grafana │ │ │ │ - Loki │ │ + │ │ - Tempo │ │ │ │ - Fluent-Bit │ │ │ └──────────────────────────┘ │ │ │ @@ -127,6 +128,7 @@ sturdy-adventure/ │ ├── prometheus.yaml │ ├── grafana.yaml │ ├── loki.yaml +│ ├── tempo.yaml │ ├── fluent-bit.yaml │ ├── trivy.yaml │ ├── sealedsecrets.yaml @@ -136,6 +138,7 @@ sturdy-adventure/ │ ├── prometheus-values.yaml │ ├── grafana-values.yaml │ ├── loki-values.yaml +│ ├── tempo-values.yaml │ └── fluent-bit-values.yaml │ ├── apps/ # Business Application ArgoCD manifests @@ -301,6 +304,7 @@ _app-of-apps.yaml (Root) │ ├── kyverno │ ├── prometheus │ ├── grafana + │ ├── tempo │ └── ... (other infra apps) │ └── enterprise-apps (manages apps/) @@ -526,8 +530,9 @@ annotations: 1. **Prometheus**: Metrics collection and storage 2. **Grafana**: Metrics visualization and dashboards 3. **Loki**: Log aggregation -4. **Fluent-Bit**: Log shipping from pods to Loki -5. **Trivy**: Container vulnerability scanning +4. **Tempo**: Distributed tracing (OTLP) +5. **Fluent-Bit**: Log shipping from pods to Loki +6. **Trivy**: Container vulnerability scanning ### Slack Notifications diff --git a/docs/OPERATIONS-RUNBOOK.md b/docs/OPERATIONS-RUNBOOK.md index 83eaaca..aa29eff 100644 --- a/docs/OPERATIONS-RUNBOOK.md +++ b/docs/OPERATIONS-RUNBOOK.md @@ -954,6 +954,33 @@ curl -G -s 'http://localhost:3100/loki/api/v1/query_range' \ --data-urlencode 'start=1h' | jq ``` +### Tempo Traces + +```bash +# Port forward to Tempo query API +kubectl port-forward -n monitoring svc/tempo 3200:3200 + +# Access: http://localhost:3200 +``` + +**Query traces via Grafana:** +1. Open Grafana → Explore +2. Select Tempo datasource +3. Use TraceQL or search by service name + +**Verify Traefik is sending traces:** +```bash +# Check Traefik logs for OTLP export errors +kubectl logs -n traefik-system -l app.kubernetes.io/name=traefik | grep -i "traces export" + +# Check Tempo is receiving data +kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep "receiver" +``` + +**Trace-to-log correlation:** +- Click a trace span in Grafana → linked Loki logs appear (by namespace, pod, container) +- Trace-to-metrics links to Prometheus by service name + ### Fluent-Bit Log Shipping Verify Fluent-Bit is shipping logs: diff --git a/docs/REFERENCE.md b/docs/REFERENCE.md index 243dd09..14fdee8 100644 --- a/docs/REFERENCE.md +++ b/docs/REFERENCE.md @@ -29,6 +29,7 @@ | **Secret Management** | Sealed Secrets (Bitnami) | | **Monitoring** | Prometheus + Grafana | | **Logging** | Loki + Fluent-Bit | +| **Tracing** | Tempo (OTLP) | | **Container Scanning** | Trivy | ### Network Architecture @@ -81,6 +82,7 @@ sturdy-adventure/ │ ├── prometheus.yaml │ ├── grafana.yaml │ ├── loki.yaml +│ ├── tempo.yaml │ ├── fluent-bit.yaml │ ├── trivy.yaml │ ├── sealedsecrets.yaml @@ -90,6 +92,7 @@ sturdy-adventure/ │ ├── prometheus-values.yaml │ ├── grafana-values.yaml │ ├── loki-values.yaml +│ ├── tempo-values.yaml │ └── fluent-bit-values.yaml │ ├── apps/ # Business applications @@ -703,6 +706,7 @@ kubeStateMetrics: **Datasources**: - Prometheus - Loki +- Tempo ### Loki @@ -720,6 +724,45 @@ promtail: enabled: false # Using Fluent-Bit instead ``` +### Tempo + +**Chart**: `grafana/tempo` +**Version**: 1.24.4 +**Namespace**: `monitoring` + +**Purpose**: Distributed tracing backend receiving OTLP traces from Traefik and other instrumented services. + +**Configuration**: +```yaml +tempo: + storage: + trace: + backend: local + local: + path: /var/tempo/traces + receivers: + otlp: + protocols: + grpc: + endpoint: "0.0.0.0:4317" + http: + endpoint: "0.0.0.0:4318" + +persistence: + enabled: true + size: 10Gi +``` + +**Endpoints**: +- gRPC OTLP receiver: `:4317` +- HTTP OTLP receiver: `:4318` +- Query API: `:3200` + +**Grafana Integration**: +- Trace-to-logs correlation with Loki (by namespace, pod, container) +- Trace-to-metrics correlation with Prometheus (by service name) +- Service graph and node graph visualization + ### Fluent-Bit **Chart**: `fluent/fluent-bit` @@ -1184,6 +1227,19 @@ GET /api/v1/query_range?query={promql}&start={time}&end={time}&step={duration} GET /api/v1/label/__name__/values ``` +### Tempo API + +``` +# Search traces +GET /api/search?q={traceql} + +# Get trace by ID +GET /api/traces/{traceID} + +# Service tag values +GET /api/v2/search/tag/resource.service.name/values +``` + ### Loki API ``` @@ -1315,6 +1371,7 @@ team: platform | **Prometheus** | 2.47.0+ | Latest | | **Grafana** | 10.0.0+ | Latest | | **Loki** | 2.9.0+ | Latest | +| **Tempo** | 2.6.0+ | 1.24.4 | | **Fluent-Bit** | 2.1.0+ | Latest | | **PostgreSQL** | 16-alpine | N/A | | **Trivy** | Latest | Latest |