255 lines
8.6 KiB
Markdown
255 lines
8.6 KiB
Markdown
# ArgoCD Applications Comprehensive Analysis Report
|
|
|
|
## Overview
|
|
Analyzed 11 ArgoCD Application manifests in `/argocd/apps/`. This report details current configurations, risks, best practice violations, security concerns, and operational improvements.
|
|
|
|
---
|
|
|
|
## Critical Issues Summary
|
|
|
|
### 1. Hardcoded Secrets (CRITICAL)
|
|
**Files:** application.yaml, grafana.yaml
|
|
- **application.yaml:** Database password "change-me-in-production"
|
|
- **grafana.yaml:** Admin password "forte" in plaintext
|
|
- **Impact:** Credentials exposed in Git history forever
|
|
- **Fix:** Migrate to Sealed Secrets immediately
|
|
|
|
### 2. Floating Versions (CRITICAL)
|
|
**Files:** application.yaml, cluster-resources-application.yaml
|
|
- Using `HEAD` instead of tagged versions
|
|
- No audit trail of deployments
|
|
- Unpredictable application behavior
|
|
- **Fix:** Pin to specific git tags or commit SHAs
|
|
|
|
### 3. Placeholder URLs (HIGH)
|
|
**Files:** fluent-bit.yaml, grafana.yaml
|
|
- Second source still has `https://github.com/YOUR_ORG/YOUR_GITOPS_REPO.git`
|
|
- Applications fail to deploy
|
|
- **Fix:** Update to actual repository URL
|
|
|
|
### 4. Undersized Resources (HIGH)
|
|
**Files:** cert-manager, loki, prometheus, trivy
|
|
- cert-manager: 100m CPU limit (too tight for control plane)
|
|
- loki: 200m CPU, 512Mi memory (drops logs under load)
|
|
- fluent-bit: 100m CPU for all-node log collection
|
|
- **Impact:** Performance degradation, OOM kills, dropped logs
|
|
- **Fix:** Increase resource limits across all monitoring stack
|
|
|
|
### 5. No Data Persistence (HIGH)
|
|
**Files:** loki.yaml (filesystem storage), prometheus.yaml
|
|
- Loki using filesystem storage (ephemeral, lost on restart)
|
|
- Prometheus likely ephemeral (no PVC visible)
|
|
- No backup strategy
|
|
- **Fix:** Configure persistent volumes with cloud storage
|
|
|
|
---
|
|
|
|
## Application-by-Application Summary
|
|
|
|
| Application | Issues | Priority | Key Recommendation |
|
|
|-------------|--------|----------|---------------------|
|
|
| **music-man** | Floating HEAD, hardcoded password, no resources | HIGH | Pin version, use Sealed Secrets, add resource limits |
|
|
| **cert-manager** | Undersized (100m), single replica, tight webhook timeout | HIGH | Increase CPU to 500m, add replicas (2-3), longer timeout |
|
|
| **cluster-resources** | Floating HEAD, RBAC missing | MEDIUM | Pin version, restrict with AppProject |
|
|
| **fluent-bit** | Placeholder URL, tight CPU (100m), HTTP server wide open | HIGH | Update repo URL, 200m CPU, restrict HTTP to localhost |
|
|
| **grafana** | Hardcoded password, placeholder URL, no persistence | CRITICAL | Sealed Secrets, update URL, add PVC |
|
|
| **kyverno** | No policies configured, no resources, no failures policies | MEDIUM | Add security policies, define resource limits |
|
|
| **loki** | Filesystem storage, no auth, single binary, tight resources | CRITICAL | S3/GCS storage, enable auth, distributed mode |
|
|
| **prometheus** | No alertmanager, service port 80, no persistence, no ingress | HIGH | Enable alertmanager, port 9090, add PVC, secure ingress |
|
|
| **sealed-secrets** | No backup procedure, single replica, no resources | MEDIUM | Document key backup, add PDB, increase replicas |
|
|
| **traefik** | TLS incomplete, LoadBalancer cloud-specific, no resources | MEDIUM | Complete TLS config, add cert-manager integration, resources |
|
|
| **trivy** | Alpha version (v0.0.7), ignoreUnfixed hides vulns, no resources | MEDIUM | Upgrade to stable (v0.3+), show all vulns, resources |
|
|
|
|
---
|
|
|
|
## Cross-Cutting Issues
|
|
|
|
### RBAC & Security (Critical)
|
|
- All apps use default project (no boundaries)
|
|
- No explicit AppProject configuration
|
|
- Cluster resources not restricted
|
|
- **Fix:** Create AppProject with granular permissions
|
|
|
|
### No Network Policies (All Namespaces)
|
|
- Unlimited pod-to-pod communication
|
|
- Monitoring stack accessible from all pods
|
|
- **Fix:** Implement NetworkPolicy for each namespace
|
|
|
|
### No Pod Disruption Budgets
|
|
- No HA guarantees during cluster operations
|
|
- Critical services can be evicted/disrupted
|
|
- **Fix:** Add PDB minAvailable: 1 for critical apps
|
|
|
|
### Incomplete TLS Configuration
|
|
- Prometheus on HTTP port 80
|
|
- Traefik TLS uses defaults (unclear)
|
|
- Fluent-bit to Loki unencrypted
|
|
- **Fix:** Implement TLS end-to-end with cert-manager
|
|
|
|
### Missing Resource Requests
|
|
- Prometheus, Traefik, Kyverno undefined
|
|
- Scheduler can overallocate resources
|
|
- **Fix:** Add requests/limits to all remaining apps
|
|
|
|
---
|
|
|
|
## Priority Remediation Roadmap
|
|
|
|
### Phase 1: CRITICAL (Immediate)
|
|
- [ ] Migrate Grafana admin password to Sealed Secrets
|
|
- [ ] Migrate music-man database password to Sealed Secrets
|
|
- [ ] Update placeholder repository URLs
|
|
- [ ] Pin floating versions (HEAD → git tags)
|
|
|
|
### Phase 2: URGENT (Week 1-2)
|
|
- [ ] Configure persistent storage for Loki
|
|
- [ ] Configure persistent storage for Prometheus
|
|
- [ ] Enable Prometheus Alertmanager
|
|
- [ ] Increase resource limits for all apps
|
|
|
|
### Phase 3: IMPORTANT (Week 2-3)
|
|
- [ ] Implement NetworkPolicies
|
|
- [ ] Create AppProject with RBAC
|
|
- [ ] Add PodDisruptionBudgets
|
|
- [ ] Configure Kyverno security policies
|
|
|
|
### Phase 4: ENHANCEMENT (Week 3-4)
|
|
- [ ] Complete TLS configuration
|
|
- [ ] Implement cert-manager integration
|
|
- [ ] Setup backup strategies
|
|
- [ ] Add comprehensive monitoring
|
|
|
|
---
|
|
|
|
## Detailed Issues by Category
|
|
|
|
### Resource Configuration
|
|
- **cert-manager:** 50m req, 100m limit (INCREASE to 250m/500m)
|
|
- **prometheus:** 250m req, 500m limit (ADEQUATE, but add to values)
|
|
- **grafana:** 100m req, 200m limit (INCREASE to 200m/400m)
|
|
- **loki:** 100m req, 200m limit (INCREASE to 200m/500m for distributed)
|
|
- **fluent-bit:** 50m req, 100m limit (INCREASE to 100m/200m)
|
|
- **traefik:** Not specified (INCREASE to 250m/500m, 256Mi/512Mi)
|
|
- **kyverno:** Not specified (ADD 100m/200m, 128Mi/256Mi)
|
|
- **trivy:** Not specified (ADD 250m/500m, 256Mi/512Mi)
|
|
- **sealedsecrets:** Not specified (ADD 100m/200m, 128Mi/256Mi)
|
|
|
|
### Storage & Persistence
|
|
- **loki:** Filesystem (CRITICAL - switch to S3/GCS)
|
|
- **prometheus:** Implicit ephemeral (ADD PVC 20-30GB)
|
|
- **grafana:** No persistence specified (QUESTIONABLE - OK for dashboards if imported)
|
|
- **sealed-secrets:** Key backup not documented (ADD backup procedure)
|
|
|
|
### High Availability
|
|
- **cert-manager:** replicaCount: 1 (INCREASE to 2-3)
|
|
- **sealed-secrets:** Implicit single replica (INCREASE to 2-3)
|
|
- **traefik:** Replicas: 2 (ADEQUATE, but add PDB)
|
|
- **monitoring stack:** Single instances (CONSIDER distributed)
|
|
|
|
### Security Gaps
|
|
- **Secrets in Git:** Grafana, music-man (MIGRATE to Sealed Secrets)
|
|
- **No Authentication:** Loki (auth_enabled: false), Prometheus (open HTTP)
|
|
- **Wide Permissions:** kubectl RBAC not restricted (ADD ClusterRole)
|
|
- **No Network Policies:** All apps (ADD NetworkPolicy)
|
|
- **TLS Incomplete:** Prometheus HTTP 80, Traefik TLS {}, Fluent→Loki HTTP
|
|
|
|
---
|
|
|
|
## Key Statistics
|
|
|
|
| Metric | Count |
|
|
|--------|-------|
|
|
| Total Applications Analyzed | 11 |
|
|
| Critical Issues | 5 |
|
|
| High Priority Issues | 12 |
|
|
| Medium Priority Issues | 20+ |
|
|
| Best Practice Violations | 30+ |
|
|
| Security Concerns | 25+ |
|
|
| Apps Missing Resource Requests | 4 |
|
|
| Apps Missing Resource Limits | 3 |
|
|
| Apps Using Floating Versions | 2 |
|
|
| Apps with Hardcoded Secrets | 2 |
|
|
| Apps Requiring Persistence | 3 |
|
|
| Apps with Single Replica Critical Services | 4 |
|
|
|
|
---
|
|
|
|
## Implementation Guidance
|
|
|
|
### Sealed Secrets Setup
|
|
```bash
|
|
# Install sealed-secrets controller
|
|
kubectl apply -f ./argocd/apps/sealedsecrets.yaml
|
|
|
|
# Seal grafana password
|
|
echo -n "new-secure-password" | kubectl create secret generic grafana-admin \
|
|
--dry-run=client --from-file=password=/dev/stdin -o yaml | \
|
|
kubeseal -f - > grafana-sealed-secret.yaml
|
|
|
|
# Update application manifests to reference sealed secrets
|
|
```
|
|
|
|
### Persistent Volume for Loki
|
|
```yaml
|
|
# Add to loki values
|
|
persistence:
|
|
enabled: true
|
|
storageClassName: "fast"
|
|
size: 50Gi
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
```
|
|
|
|
### AppProject for RBAC
|
|
```yaml
|
|
apiVersion: argoproj.io/v1alpha1
|
|
kind: AppProject
|
|
metadata:
|
|
name: platform
|
|
spec:
|
|
destinations:
|
|
- namespace: '*'
|
|
server: 'https://kubernetes.default.svc'
|
|
sourceRepos:
|
|
- 'https://github.com/snothub/*'
|
|
roles:
|
|
- name: admin
|
|
policies:
|
|
- p, proj:platform:admin, applications, *, */*, allow
|
|
```
|
|
|
|
### NetworkPolicy for Monitoring
|
|
```yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: NetworkPolicy
|
|
metadata:
|
|
name: monitoring-access
|
|
namespace: monitoring
|
|
spec:
|
|
podSelector:
|
|
matchLabels:
|
|
app: prometheus
|
|
policyTypes:
|
|
- Ingress
|
|
ingress:
|
|
- from:
|
|
- podSelector:
|
|
matchLabels:
|
|
app: grafana
|
|
ports:
|
|
- protocol: TCP
|
|
port: 9090
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Review this analysis** with your team
|
|
2. **Create tickets** for each critical/high issue
|
|
3. **Schedule remediation** according to roadmap
|
|
4. **Document changes** as they're made
|
|
5. **Test thoroughly** in dev/staging first
|
|
6. **Monitor impact** after production changes
|
|
|