Tempo doc

This commit is contained in:
2026-03-20 14:22:14 +01:00
parent 7522b88cfb
commit f728f9dbd3
4 changed files with 98 additions and 6 deletions

View File

@@ -21,7 +21,7 @@ This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where
- **Deployment Pattern**: App-of-Apps
- **Secret Management**: Sealed Secrets (kubeseal)
- **Ingress**: Traefik with Let's Encrypt TLS
- **Monitoring**: Prometheus + Grafana + Loki + Fluent-Bit
- **Monitoring**: Prometheus + Grafana + Loki + Tempo + Fluent-Bit
- **Policy Engine**: Kyverno
- **Notifications**: Slack integration for sync status
@@ -83,6 +83,7 @@ This Kubernetes cluster uses a **GitOps approach** powered by **ArgoCD**, where
│ │ - Prometheus │ │
│ │ - Grafana │ │
│ │ - Loki │ │
│ │ - Tempo │ │
│ │ - Fluent-Bit │ │
│ └──────────────────────────┘ │
│ │
@@ -127,6 +128,7 @@ sturdy-adventure/
│ ├── prometheus.yaml
│ ├── grafana.yaml
│ ├── loki.yaml
│ ├── tempo.yaml
│ ├── fluent-bit.yaml
│ ├── trivy.yaml
│ ├── sealedsecrets.yaml
@@ -136,6 +138,7 @@ sturdy-adventure/
│ ├── prometheus-values.yaml
│ ├── grafana-values.yaml
│ ├── loki-values.yaml
│ ├── tempo-values.yaml
│ └── fluent-bit-values.yaml
├── apps/ # Business Application ArgoCD manifests
@@ -301,6 +304,7 @@ _app-of-apps.yaml (Root)
│ ├── kyverno
│ ├── prometheus
│ ├── grafana
│ ├── tempo
│ └── ... (other infra apps)
└── enterprise-apps (manages apps/)
@@ -526,8 +530,9 @@ annotations:
1. **Prometheus**: Metrics collection and storage
2. **Grafana**: Metrics visualization and dashboards
3. **Loki**: Log aggregation
4. **Fluent-Bit**: Log shipping from pods to Loki
5. **Trivy**: Container vulnerability scanning
4. **Tempo**: Distributed tracing (OTLP)
5. **Fluent-Bit**: Log shipping from pods to Loki
6. **Trivy**: Container vulnerability scanning
### Slack Notifications

View File

@@ -954,6 +954,33 @@ curl -G -s 'http://localhost:3100/loki/api/v1/query_range' \
--data-urlencode 'start=1h' | jq
```
### Tempo Traces
```bash
# Port forward to Tempo query API
kubectl port-forward -n monitoring svc/tempo 3200:3200
# Access: http://localhost:3200
```
**Query traces via Grafana:**
1. Open Grafana → Explore
2. Select Tempo datasource
3. Use TraceQL or search by service name
**Verify Traefik is sending traces:**
```bash
# Check Traefik logs for OTLP export errors
kubectl logs -n traefik-system -l app.kubernetes.io/name=traefik | grep -i "traces export"
# Check Tempo is receiving data
kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep "receiver"
```
**Trace-to-log correlation:**
- Click a trace span in Grafana → linked Loki logs appear (by namespace, pod, container)
- Trace-to-metrics links to Prometheus by service name
### Fluent-Bit Log Shipping
Verify Fluent-Bit is shipping logs:

View File

@@ -29,6 +29,7 @@
| **Secret Management** | Sealed Secrets (Bitnami) |
| **Monitoring** | Prometheus + Grafana |
| **Logging** | Loki + Fluent-Bit |
| **Tracing** | Tempo (OTLP) |
| **Container Scanning** | Trivy |
### Network Architecture
@@ -81,6 +82,7 @@ sturdy-adventure/
│ ├── prometheus.yaml
│ ├── grafana.yaml
│ ├── loki.yaml
│ ├── tempo.yaml
│ ├── fluent-bit.yaml
│ ├── trivy.yaml
│ ├── sealedsecrets.yaml
@@ -90,6 +92,7 @@ sturdy-adventure/
│ ├── prometheus-values.yaml
│ ├── grafana-values.yaml
│ ├── loki-values.yaml
│ ├── tempo-values.yaml
│ └── fluent-bit-values.yaml
├── apps/ # Business applications
@@ -703,6 +706,7 @@ kubeStateMetrics:
**Datasources**:
- Prometheus
- Loki
- Tempo
### Loki
@@ -720,6 +724,45 @@ promtail:
enabled: false # Using Fluent-Bit instead
```
### Tempo
**Chart**: `grafana/tempo`
**Version**: 1.24.4
**Namespace**: `monitoring`
**Purpose**: Distributed tracing backend receiving OTLP traces from Traefik and other instrumented services.
**Configuration**:
```yaml
tempo:
storage:
trace:
backend: local
local:
path: /var/tempo/traces
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
persistence:
enabled: true
size: 10Gi
```
**Endpoints**:
- gRPC OTLP receiver: `:4317`
- HTTP OTLP receiver: `:4318`
- Query API: `:3200`
**Grafana Integration**:
- Trace-to-logs correlation with Loki (by namespace, pod, container)
- Trace-to-metrics correlation with Prometheus (by service name)
- Service graph and node graph visualization
### Fluent-Bit
**Chart**: `fluent/fluent-bit`
@@ -1184,6 +1227,19 @@ GET /api/v1/query_range?query={promql}&start={time}&end={time}&step={duration}
GET /api/v1/label/__name__/values
```
### Tempo API
```
# Search traces
GET /api/search?q={traceql}
# Get trace by ID
GET /api/traces/{traceID}
# Service tag values
GET /api/v2/search/tag/resource.service.name/values
```
### Loki API
```
@@ -1315,6 +1371,7 @@ team: platform
| **Prometheus** | 2.47.0+ | Latest |
| **Grafana** | 10.0.0+ | Latest |
| **Loki** | 2.9.0+ | Latest |
| **Tempo** | 2.6.0+ | 1.24.4 |
| **Fluent-Bit** | 2.1.0+ | Latest |
| **PostgreSQL** | 16-alpine | N/A |
| **Trivy** | Latest | Latest |