8.6 KiB
8.6 KiB
ArgoCD Applications Comprehensive Analysis Report
Overview
Analyzed 11 ArgoCD Application manifests in /argocd/apps/. This report details current configurations, risks, best practice violations, security concerns, and operational improvements.
Critical Issues Summary
1. Hardcoded Secrets (CRITICAL)
Files: application.yaml, grafana.yaml
- application.yaml: Database password "change-me-in-production"
- grafana.yaml: Admin password "forte" in plaintext
- Impact: Credentials exposed in Git history forever
- Fix: Migrate to Sealed Secrets immediately
2. Floating Versions (CRITICAL)
Files: application.yaml, cluster-resources-application.yaml
- Using
HEADinstead of tagged versions - No audit trail of deployments
- Unpredictable application behavior
- Fix: Pin to specific git tags or commit SHAs
3. Placeholder URLs (HIGH)
Files: fluent-bit.yaml, grafana.yaml
- Second source still has
https://github.com/YOUR_ORG/YOUR_GITOPS_REPO.git - Applications fail to deploy
- Fix: Update to actual repository URL
4. Undersized Resources (HIGH)
Files: cert-manager, loki, prometheus, trivy
- cert-manager: 100m CPU limit (too tight for control plane)
- loki: 200m CPU, 512Mi memory (drops logs under load)
- fluent-bit: 100m CPU for all-node log collection
- Impact: Performance degradation, OOM kills, dropped logs
- Fix: Increase resource limits across all monitoring stack
5. No Data Persistence (HIGH)
Files: loki.yaml (filesystem storage), prometheus.yaml
- Loki using filesystem storage (ephemeral, lost on restart)
- Prometheus likely ephemeral (no PVC visible)
- No backup strategy
- Fix: Configure persistent volumes with cloud storage
Application-by-Application Summary
| Application | Issues | Priority | Key Recommendation |
|---|---|---|---|
| music-man | Floating HEAD, hardcoded password, no resources | HIGH | Pin version, use Sealed Secrets, add resource limits |
| cert-manager | Undersized (100m), single replica, tight webhook timeout | HIGH | Increase CPU to 500m, add replicas (2-3), longer timeout |
| cluster-resources | Floating HEAD, RBAC missing | MEDIUM | Pin version, restrict with AppProject |
| fluent-bit | Placeholder URL, tight CPU (100m), HTTP server wide open | HIGH | Update repo URL, 200m CPU, restrict HTTP to localhost |
| grafana | Hardcoded password, placeholder URL, no persistence | CRITICAL | Sealed Secrets, update URL, add PVC |
| kyverno | No policies configured, no resources, no failures policies | MEDIUM | Add security policies, define resource limits |
| loki | Filesystem storage, no auth, single binary, tight resources | CRITICAL | S3/GCS storage, enable auth, distributed mode |
| prometheus | No alertmanager, service port 80, no persistence, no ingress | HIGH | Enable alertmanager, port 9090, add PVC, secure ingress |
| sealed-secrets | No backup procedure, single replica, no resources | MEDIUM | Document key backup, add PDB, increase replicas |
| traefik | TLS incomplete, LoadBalancer cloud-specific, no resources | MEDIUM | Complete TLS config, add cert-manager integration, resources |
| trivy | Alpha version (v0.0.7), ignoreUnfixed hides vulns, no resources | MEDIUM | Upgrade to stable (v0.3+), show all vulns, resources |
Cross-Cutting Issues
RBAC & Security (Critical)
- All apps use default project (no boundaries)
- No explicit AppProject configuration
- Cluster resources not restricted
- Fix: Create AppProject with granular permissions
No Network Policies (All Namespaces)
- Unlimited pod-to-pod communication
- Monitoring stack accessible from all pods
- Fix: Implement NetworkPolicy for each namespace
No Pod Disruption Budgets
- No HA guarantees during cluster operations
- Critical services can be evicted/disrupted
- Fix: Add PDB minAvailable: 1 for critical apps
Incomplete TLS Configuration
- Prometheus on HTTP port 80
- Traefik TLS uses defaults (unclear)
- Fluent-bit to Loki unencrypted
- Fix: Implement TLS end-to-end with cert-manager
Missing Resource Requests
- Prometheus, Traefik, Kyverno undefined
- Scheduler can overallocate resources
- Fix: Add requests/limits to all remaining apps
Priority Remediation Roadmap
Phase 1: CRITICAL (Immediate)
- Migrate Grafana admin password to Sealed Secrets
- Migrate music-man database password to Sealed Secrets
- Update placeholder repository URLs
- Pin floating versions (HEAD → git tags)
Phase 2: URGENT (Week 1-2)
- Configure persistent storage for Loki
- Configure persistent storage for Prometheus
- Enable Prometheus Alertmanager
- Increase resource limits for all apps
Phase 3: IMPORTANT (Week 2-3)
- Implement NetworkPolicies
- Create AppProject with RBAC
- Add PodDisruptionBudgets
- Configure Kyverno security policies
Phase 4: ENHANCEMENT (Week 3-4)
- Complete TLS configuration
- Implement cert-manager integration
- Setup backup strategies
- Add comprehensive monitoring
Detailed Issues by Category
Resource Configuration
- cert-manager: 50m req, 100m limit (INCREASE to 250m/500m)
- prometheus: 250m req, 500m limit (ADEQUATE, but add to values)
- grafana: 100m req, 200m limit (INCREASE to 200m/400m)
- loki: 100m req, 200m limit (INCREASE to 200m/500m for distributed)
- fluent-bit: 50m req, 100m limit (INCREASE to 100m/200m)
- traefik: Not specified (INCREASE to 250m/500m, 256Mi/512Mi)
- kyverno: Not specified (ADD 100m/200m, 128Mi/256Mi)
- trivy: Not specified (ADD 250m/500m, 256Mi/512Mi)
- sealedsecrets: Not specified (ADD 100m/200m, 128Mi/256Mi)
Storage & Persistence
- loki: Filesystem (CRITICAL - switch to S3/GCS)
- prometheus: Implicit ephemeral (ADD PVC 20-30GB)
- grafana: No persistence specified (QUESTIONABLE - OK for dashboards if imported)
- sealed-secrets: Key backup not documented (ADD backup procedure)
High Availability
- cert-manager: replicaCount: 1 (INCREASE to 2-3)
- sealed-secrets: Implicit single replica (INCREASE to 2-3)
- traefik: Replicas: 2 (ADEQUATE, but add PDB)
- monitoring stack: Single instances (CONSIDER distributed)
Security Gaps
- Secrets in Git: Grafana, music-man (MIGRATE to Sealed Secrets)
- No Authentication: Loki (auth_enabled: false), Prometheus (open HTTP)
- Wide Permissions: kubectl RBAC not restricted (ADD ClusterRole)
- No Network Policies: All apps (ADD NetworkPolicy)
- TLS Incomplete: Prometheus HTTP 80, Traefik TLS {}, Fluent→Loki HTTP
Key Statistics
| Metric | Count |
|---|---|
| Total Applications Analyzed | 11 |
| Critical Issues | 5 |
| High Priority Issues | 12 |
| Medium Priority Issues | 20+ |
| Best Practice Violations | 30+ |
| Security Concerns | 25+ |
| Apps Missing Resource Requests | 4 |
| Apps Missing Resource Limits | 3 |
| Apps Using Floating Versions | 2 |
| Apps with Hardcoded Secrets | 2 |
| Apps Requiring Persistence | 3 |
| Apps with Single Replica Critical Services | 4 |
Implementation Guidance
Sealed Secrets Setup
# Install sealed-secrets controller
kubectl apply -f ./argocd/apps/sealedsecrets.yaml
# Seal grafana password
echo -n "new-secure-password" | kubectl create secret generic grafana-admin \
--dry-run=client --from-file=password=/dev/stdin -o yaml | \
kubeseal -f - > grafana-sealed-secret.yaml
# Update application manifests to reference sealed secrets
Persistent Volume for Loki
# Add to loki values
persistence:
enabled: true
storageClassName: "fast"
size: 50Gi
accessModes:
- ReadWriteOnce
AppProject for RBAC
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: platform
spec:
destinations:
- namespace: '*'
server: 'https://kubernetes.default.svc'
sourceRepos:
- 'https://github.com/snothub/*'
roles:
- name: admin
policies:
- p, proj:platform:admin, applications, *, */*, allow
NetworkPolicy for Monitoring
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: monitoring-access
namespace: monitoring
spec:
podSelector:
matchLabels:
app: prometheus
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: grafana
ports:
- protocol: TCP
port: 9090
Next Steps
- Review this analysis with your team
- Create tickets for each critical/high issue
- Schedule remediation according to roadmap
- Document changes as they're made
- Test thoroughly in dev/staging first
- Monitor impact after production changes