Kubernetes Configuration
Our Kubernetes infrastructure operates on Amazon EKS (Elastic Kubernetes Service), a choice we made for its reliability and seamless integration with AWS services. We've embraced GitOps principles through Flux CD, ensuring our Git repository serves as the single source of truth for our cluster's state.
Cluster Architecture
GitOps Structure
At the heart of our infrastructure management philosophy lies the commitment to managing everything through code. We maintain all our Kubernetes configurations in our numinia-k8s repository, which not only allows us to track changes and review modifications but also maintains a complete history of how our infrastructure has evolved over time.
The repository follows a thoughtfully organized structure:
clusters/
└── prod-cluster/
├── flux-system/ # Flux core components
├── flux-kustomization/ # Custom configurations
├── infrastructure/ # Cluster-wide resources
└── organizations/ # Organization resources
This structure reflects our commitment to clear separation of concerns. Within the flux-system
directory, you'll find everything needed for our GitOps automation. The infrastructure
directory houses our shared cluster resources, while organization-specific configurations reside in their own directory, maintaining clear boundaries between different teams' resources.
One of our key architectural decisions was to implement Kustomize for configuration management. This powerful tool allows us to maintain a single source of truth while adapting configurations for different environments or requirements. For instance, we can define a base configuration for our monitoring stack and then customize it for different organizations without duplicating code.
Flux Configuration
Flux CD serves as the backbone of our GitOps implementation. It continuously monitors our Git repository and automatically applies changes to our cluster. Here's our core configuration:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m0s
ref:
branch: main
url: https://github.com/numengames/numinia-k8s.git
We've carefully chosen a one-minute sync interval to ensure rapid updates when changes are pushed to our repository. This means that when a developer merges a change to the main branch, it will be reflected in our cluster within a minute, maintaining a tight feedback loop between our code and running infrastructure.
Infrastructure Components
Base Infrastructure
Our cluster's foundation is built upon several critical components that provide essential functionality for all our applications. In the infrastructure/base/
directory, we've implemented a comprehensive set of core services that form the backbone of our system.
Our storage system combines EFS and EBS storage classes, each serving distinct purposes. EFS provides shared storage accessible by multiple pods simultaneously, which is essential for applications that need to share data. EBS, on the other hand, delivers high-performance block storage, perfect for databases and other stateful applications requiring dedicated storage.
The networking layer incorporates a sophisticated service mesh configuration that handles advanced traffic management and security features. This setup gives us fine-grained control over service-to-service communication, implements intelligent retry logic, and provides detailed metrics about inter-service communication patterns.
Security stands at the forefront of our design, implemented through comprehensive pod security and network policies. These policies define precise boundaries for pod capabilities and communications, adhering strictly to the principle of least privilege.
Our monitoring infrastructure leverages the power of Prometheus and Grafana. Prometheus continuously collects metrics from our services, while Grafana transforms this data into actionable insights through visualization and alerting capabilities. This combination provides us with real-time visibility into our cluster's health and performance metrics.
Application Platform
The application platform layer orchestrates the services that our applications depend on. This critical layer ensures a consistent and reliable environment for our workloads.
Traffic management flows through our ingress controllers, which intelligently route external traffic to the appropriate services within the cluster. We've automated certificate management to ensure all services are served over HTTPS, with certificates automatically renewed before expiration.
Service discovery utilizes CoreDNS, configured to handle both internal DNS resolution and external DNS integration. This setup enables our services to communicate using simple, human-readable names, making the system more maintainable and easier to understand.
Our logging infrastructure builds on the EFK (Elasticsearch, Fluentd, Kibana) stack, providing comprehensive log collection and analysis capabilities. This system collects logs from all containers and system components, processes and enriches them with metadata, and makes them searchable through Kibana's intuitive interface.
Organization Management
Our cluster architecture supports multiple organizations while maintaining strict isolation and resource management. This multi-tenant approach requires careful consideration of security and resource allocation.
Multi-tenant Architecture
We achieve secure multi-tenancy through a combination of namespace isolation, resource quotas, and network policies. Each organization operates within its own namespace, with carefully defined resource limits and network boundaries. This isolation ensures that organizations can't interfere with each other's workloads while sharing the same cluster infrastructure.
Organizations follow a consistent structure:
organizations/<org-name>/
├── base/ # Base configurations
├── applications/ # Organization apps
└── infrastructure/ # Org-specific infrastructure
Security Implementation
Security permeates every aspect of our infrastructure through multiple complementary layers. Our RBAC configuration ensures proper access management through carefully defined service accounts, each with limited scope and automated token management. We integrate closely with AWS IAM to maintain consistent access controls across our entire infrastructure.
Network policies start from a default deny-all stance, with explicit allow rules carefully crafted to permit only necessary communication paths. This approach ensures that all network traffic is intentional and documented.
Secret Management
Sensitive information handling occurs through external secrets, integrating with AWS Secrets Manager for secure storage and automated rotation. Certificate management is fully automated, handling TLS provisioning and secure key storage while managing the complete certificate lifecycle.
Storage Configuration
Our storage implementation prioritizes reliability and security in data persistence. The EFS integration provides shared storage needs across the cluster, configured through a carefully designed storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-xxxxx
directoryPerms: "700"
This configuration enables ReadWriteMany access mode for shared storage, incorporates automated backup systems, and ensures encryption at rest for security. Our volume management strategy includes dynamic provisioning, automated volume creation, and comprehensive backup management with regular snapshots and cross-region replication.
Best Practices
Our Kubernetes infrastructure follows carefully considered best practices in several key areas. Resource management focuses on logical namespace organization and fair resource allocation through quotas. Our monitoring strategy provides comprehensive visibility through metrics collection, structured logging, and regular performance analysis.
Maintenance procedures ensure continued cluster health through scheduled patching, version management, and regular testing of disaster recovery procedures. We maintain detailed documentation of all procedures and regularly review and update our practices to incorporate new learnings and improvements.
For detailed implementation specifics, refer to our numinia-k8s repository.