AWS Elastic Container Service (ECS) is often used as a platform for Microservices on AWS. In most cases, container images are stored in AWS Elastic Container Registry (ECR). However, over time, unused images can accumulate, especially when using continuous integration and continuous delivery approaches, which wastes both storage space and costs. ECR vulnerability scanning also leads to many irrelevant findings as they relate to images that are no longer in use.
This article shows how you can automatically clean up your central ECR repositories to remove old images that are no longer used by AWS ECS services. The presented approach is designed for a multi-account setup by running the production environment in another AWS account as UAT/Dev and can be used with "tag immutability".
Why remove old container images?
Every image that is stored in an ECR repository incurs costs, for storage and for rescanning by AWS Inspector if Enhanced Scanning is enabled. The removal of unused images helps to avoid unnecessary expenses.
The accumulation of unused images also makes it difficult to identify security vulnerabilities of productive services, as vulnerability scan reports covers all images.
ECR lifecycle policies are often used to clean up ECR repositories automatically. You can configure rules based on image tag patterns and image push dates to limit the number of images in a repository. However, lifecycle policy rules cannot check when an image was last used, e.g. actively used in an ECS task.
The usage of tags such as "production", which are set during deployment, can be a way of deleting unused images via lifecycle policies. However, this is not possible if "tag immutability" is enabled for a repository, and it complicates the deployment process.
Automation scripts such as awslabs/ecr-cleanup-lambda go one step further and take the actual image usage into account by analyzing the running ECS tasks.
An automated Multi-Account ECR Cleanup Approach
Before we outline the solution approach, let's summarize the requirements and restrictions:
- ECR Container Images that are not used in the last 24 hours should be deleted automatically. ("Used": referred in an active ECS task definition.) The time constraint allows rolling back to previous deployed versions within 24 hours.
- ECR Container Images can run in multiple ECS services in different AWS accounts (e.g. Production, UAT/Dev)
- ECR Repositories are located in a shared AWS account
- ECR Repositories might use "tag immutability" (a tag can only be used once)
- ECR Repositories should be explicitly enabled for "automated removal" to prevent accidental loss of images. For example, if the repository is used in Kubernetes or Lambda deployments.
The fact that an ECS task can run in different AWS accounts makes it difficult to query active ECS task definitions. We can use AWS Config Aggregator to build a central, searchable resource inventory that includes ECS task definitions of multiple connected AWS Accounts (1).
Since only container images that have not been part of an active ECS task definition in the last 24 hours should be deleted, we cannot simply query active task definitions and delete all other images. Unfortunately, AWS Config does not allow us to apply queries "back-in-time", but only against the current state only. To respect the time constraint, whenever a task definition becomes inactive (2), we (re)schedule an "image deletion check" in 24 hours for each container image used in the inactivated task definition (3). We can centralize the event-based scheduling by forwarding ECS task definition events from each ECS workload Accounts to our shared AWS accounts via AWS EventBridge Rules and use EventBridge Scheduler to start "image deletion checks".
When an "image deletion check" is triggered 24 hours later, a query to the AWS Config Aggregator is executed that looks for active task definitions containing the specific image (4). If no active task definitions are found, the image can be removed from the repository (5).
To ensure that only opt-in repositories are automatically cleaned, we can check the existence of a repository feature tag, when scheduling or executing "image deletion checks".
Conclusion
Managing container images in AWS Elastic Container Registry (ECR) for multi-account AWS Elastic Container Service (ECS) environments can be challenging, particularly when ensuring cost-efficiency and maintaining security through vulnerability scanning. This article has outlined an automated approach to clean up unused ECR container images by leveraging AWS Config Aggregator, EventBridge, and EventBridge Scheduler.
By implementing this solution, you can ensure that only container images actively used in ECS services remain in your repositories. The approach respects the complexities of multi-account setups and considers essential constraints, such as tag immutability and rollback windows. This helps reduce storage costs, improves vulnerability scanning accuracy, and simplifies repository management.
Adopting this automated cleanup process is a proactive step towards optimizing your AWS infrastructure, enhancing security posture, and fostering better resource management across accounts.
Author Of article : chgerkens Read full article