Skip to main content

010 - Use AWS EKS for monitoring infrastructure

Date: 2021-03-22

Status

✅ Accepted

Context

The infrastructure monitoring and alerting platform consists of several services deployed as docker containers. So far these containers have been running on ECS via Fargate, chosen because of the relative ease with which it allows us to get instances provisioned.

As the solution has grown, and the interactions between new services have become more complex, we have found that we are running up against Fargate’s limitations and require finer-grained control over our deployments.

Kubernetes is the industry standard platform for orchestrating and running container based workloads and provides considerably more flexibility in comparison to ECS and Fargate.

Decision

Starting with Prometheus and Thanos, we are migrating our services over to AWS’s managed Kubernetes offering - Amazon Elastic Kubernetes Service (EKS).

Consequences

While it has the potential to be more complicated due to its increased flexibility, we believe that in the long run, Kubernetes will simplify the operation, maintenance, and improvement of the IMA platform. It offers several advantages over Fargate:

  • Better networking support out of the box enabling:
    • high availability support
    • reduced need for networking infrastructure external to the cluster
  • Decoupling of services from infrastructure leading to:
    • faster development cycle
    • Simpler and clearer configuration
    • Less reliance on specific infrastructure (could conceivably run on any Kubernetes cluster, regardless of the provider)
    • Reduced overall costs as the team can share the same development Kubernetes cluster
  • More aligned with common DevOps approaches in wider industry
  • The infrastructure will be ready to migrate to another hosting platform like Cloud Platform in the future. (see issue here)
This page was last reviewed on 15 April 2024. It needs to be reviewed again on 15 October 2024 by the page owner #nvvs-devops .
This page was set to be reviewed before 15 October 2024 by the page owner #nvvs-devops. This might mean the content is out of date.