Files
launchpad/ARGOCD_COMPREHENSIVE_ANALYSIS.md
Danijel Simeunovic bbc863995d argo report
2026-02-07 21:40:13 +01:00

8.6 KiB

ArgoCD Applications Comprehensive Analysis Report

Overview

Analyzed 11 ArgoCD Application manifests in /argocd/apps/. This report details current configurations, risks, best practice violations, security concerns, and operational improvements.


Critical Issues Summary

1. Hardcoded Secrets (CRITICAL)

Files: application.yaml, grafana.yaml

  • application.yaml: Database password "change-me-in-production"
  • grafana.yaml: Admin password "forte" in plaintext
  • Impact: Credentials exposed in Git history forever
  • Fix: Migrate to Sealed Secrets immediately

2. Floating Versions (CRITICAL)

Files: application.yaml, cluster-resources-application.yaml

  • Using HEAD instead of tagged versions
  • No audit trail of deployments
  • Unpredictable application behavior
  • Fix: Pin to specific git tags or commit SHAs

3. Placeholder URLs (HIGH)

Files: fluent-bit.yaml, grafana.yaml

  • Second source still has https://github.com/YOUR_ORG/YOUR_GITOPS_REPO.git
  • Applications fail to deploy
  • Fix: Update to actual repository URL

4. Undersized Resources (HIGH)

Files: cert-manager, loki, prometheus, trivy

  • cert-manager: 100m CPU limit (too tight for control plane)
  • loki: 200m CPU, 512Mi memory (drops logs under load)
  • fluent-bit: 100m CPU for all-node log collection
  • Impact: Performance degradation, OOM kills, dropped logs
  • Fix: Increase resource limits across all monitoring stack

5. No Data Persistence (HIGH)

Files: loki.yaml (filesystem storage), prometheus.yaml

  • Loki using filesystem storage (ephemeral, lost on restart)
  • Prometheus likely ephemeral (no PVC visible)
  • No backup strategy
  • Fix: Configure persistent volumes with cloud storage

Application-by-Application Summary

Application Issues Priority Key Recommendation
music-man Floating HEAD, hardcoded password, no resources HIGH Pin version, use Sealed Secrets, add resource limits
cert-manager Undersized (100m), single replica, tight webhook timeout HIGH Increase CPU to 500m, add replicas (2-3), longer timeout
cluster-resources Floating HEAD, RBAC missing MEDIUM Pin version, restrict with AppProject
fluent-bit Placeholder URL, tight CPU (100m), HTTP server wide open HIGH Update repo URL, 200m CPU, restrict HTTP to localhost
grafana Hardcoded password, placeholder URL, no persistence CRITICAL Sealed Secrets, update URL, add PVC
kyverno No policies configured, no resources, no failures policies MEDIUM Add security policies, define resource limits
loki Filesystem storage, no auth, single binary, tight resources CRITICAL S3/GCS storage, enable auth, distributed mode
prometheus No alertmanager, service port 80, no persistence, no ingress HIGH Enable alertmanager, port 9090, add PVC, secure ingress
sealed-secrets No backup procedure, single replica, no resources MEDIUM Document key backup, add PDB, increase replicas
traefik TLS incomplete, LoadBalancer cloud-specific, no resources MEDIUM Complete TLS config, add cert-manager integration, resources
trivy Alpha version (v0.0.7), ignoreUnfixed hides vulns, no resources MEDIUM Upgrade to stable (v0.3+), show all vulns, resources

Cross-Cutting Issues

RBAC & Security (Critical)

  • All apps use default project (no boundaries)
  • No explicit AppProject configuration
  • Cluster resources not restricted
  • Fix: Create AppProject with granular permissions

No Network Policies (All Namespaces)

  • Unlimited pod-to-pod communication
  • Monitoring stack accessible from all pods
  • Fix: Implement NetworkPolicy for each namespace

No Pod Disruption Budgets

  • No HA guarantees during cluster operations
  • Critical services can be evicted/disrupted
  • Fix: Add PDB minAvailable: 1 for critical apps

Incomplete TLS Configuration

  • Prometheus on HTTP port 80
  • Traefik TLS uses defaults (unclear)
  • Fluent-bit to Loki unencrypted
  • Fix: Implement TLS end-to-end with cert-manager

Missing Resource Requests

  • Prometheus, Traefik, Kyverno undefined
  • Scheduler can overallocate resources
  • Fix: Add requests/limits to all remaining apps

Priority Remediation Roadmap

Phase 1: CRITICAL (Immediate)

  • Migrate Grafana admin password to Sealed Secrets
  • Migrate music-man database password to Sealed Secrets
  • Update placeholder repository URLs
  • Pin floating versions (HEAD → git tags)

Phase 2: URGENT (Week 1-2)

  • Configure persistent storage for Loki
  • Configure persistent storage for Prometheus
  • Enable Prometheus Alertmanager
  • Increase resource limits for all apps

Phase 3: IMPORTANT (Week 2-3)

  • Implement NetworkPolicies
  • Create AppProject with RBAC
  • Add PodDisruptionBudgets
  • Configure Kyverno security policies

Phase 4: ENHANCEMENT (Week 3-4)

  • Complete TLS configuration
  • Implement cert-manager integration
  • Setup backup strategies
  • Add comprehensive monitoring

Detailed Issues by Category

Resource Configuration

  • cert-manager: 50m req, 100m limit (INCREASE to 250m/500m)
  • prometheus: 250m req, 500m limit (ADEQUATE, but add to values)
  • grafana: 100m req, 200m limit (INCREASE to 200m/400m)
  • loki: 100m req, 200m limit (INCREASE to 200m/500m for distributed)
  • fluent-bit: 50m req, 100m limit (INCREASE to 100m/200m)
  • traefik: Not specified (INCREASE to 250m/500m, 256Mi/512Mi)
  • kyverno: Not specified (ADD 100m/200m, 128Mi/256Mi)
  • trivy: Not specified (ADD 250m/500m, 256Mi/512Mi)
  • sealedsecrets: Not specified (ADD 100m/200m, 128Mi/256Mi)

Storage & Persistence

  • loki: Filesystem (CRITICAL - switch to S3/GCS)
  • prometheus: Implicit ephemeral (ADD PVC 20-30GB)
  • grafana: No persistence specified (QUESTIONABLE - OK for dashboards if imported)
  • sealed-secrets: Key backup not documented (ADD backup procedure)

High Availability

  • cert-manager: replicaCount: 1 (INCREASE to 2-3)
  • sealed-secrets: Implicit single replica (INCREASE to 2-3)
  • traefik: Replicas: 2 (ADEQUATE, but add PDB)
  • monitoring stack: Single instances (CONSIDER distributed)

Security Gaps

  • Secrets in Git: Grafana, music-man (MIGRATE to Sealed Secrets)
  • No Authentication: Loki (auth_enabled: false), Prometheus (open HTTP)
  • Wide Permissions: kubectl RBAC not restricted (ADD ClusterRole)
  • No Network Policies: All apps (ADD NetworkPolicy)
  • TLS Incomplete: Prometheus HTTP 80, Traefik TLS {}, Fluent→Loki HTTP

Key Statistics

Metric Count
Total Applications Analyzed 11
Critical Issues 5
High Priority Issues 12
Medium Priority Issues 20+
Best Practice Violations 30+
Security Concerns 25+
Apps Missing Resource Requests 4
Apps Missing Resource Limits 3
Apps Using Floating Versions 2
Apps with Hardcoded Secrets 2
Apps Requiring Persistence 3
Apps with Single Replica Critical Services 4

Implementation Guidance

Sealed Secrets Setup

# Install sealed-secrets controller
kubectl apply -f ./argocd/apps/sealedsecrets.yaml

# Seal grafana password
echo -n "new-secure-password" | kubectl create secret generic grafana-admin \
  --dry-run=client --from-file=password=/dev/stdin -o yaml | \
  kubeseal -f - > grafana-sealed-secret.yaml

# Update application manifests to reference sealed secrets

Persistent Volume for Loki

# Add to loki values
persistence:
  enabled: true
  storageClassName: "fast"
  size: 50Gi
  accessModes:
    - ReadWriteOnce

AppProject for RBAC

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: platform
spec:
  destinations:
  - namespace: '*'
    server: 'https://kubernetes.default.svc'
  sourceRepos:
  - 'https://github.com/snothub/*'
  roles:
  - name: admin
    policies:
    - p, proj:platform:admin, applications, *, */*, allow

NetworkPolicy for Monitoring

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: monitoring-access
  namespace: monitoring
spec:
  podSelector:
    matchLabels:
      app: prometheus
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: grafana
    ports:
    - protocol: TCP
      port: 9090

Next Steps

  1. Review this analysis with your team
  2. Create tickets for each critical/high issue
  3. Schedule remediation according to roadmap
  4. Document changes as they're made
  5. Test thoroughly in dev/staging first
  6. Monitor impact after production changes