AWS Fargate Networking and Deployment Optimization with Terraform
Fargate Networking Docker Terraform ECS
AWS Fargate Terraform Module
On This Page
- Introduction
- Setup Demo App
- Quick Start Deploy with Terraform
- Logging and Monitoring
- Change Network Settings
- Optimization Settings
Introduction
Use Terraform to create a VPC and Fargate resources, then use the demo apps to experiment with different networking options. These modules provide a custom VPC network where developers can in conveniently test and optimize their containerized applications. Experiment with different networking options like VPC Endpoints and NAT Gateway for Fargate tasks running in private or public subnets. Add your own images and try deployments with a NAT Gateway or VPC Endpoints. Easily modify container names and ports, alter health check settings, Target Group deregistration delay, etc. The VPC network is created automatically with no configuration required but defaults can be customized.
I cover some of these networking modes in more detail in this blog.
My VPC and LoadBalancer modules are required for the networking. They are imported from Terraform Registry in network.tf
in the root of this module. The VPC module can be used for other things and not dependent on this module.
For tasks you want to keep in private subnets, you can use either VPC Endpoints or NAT Gateway. They can be enabled in variables.tf
To demonstrate this I’ve put together 2 images that can be used in Fargate Task Definitions. If you want to follow along you’ll need Docker, Terraform, node18+, Python 3.10 and an AWS account with administrator access.
Setup Demo App
You can use your images these are just samples for demonstration.
React and NGINX Image - (link)
NGINX serves the static React files and configured to handle the client side routing. Also in nginx.conf
is a proxy_pass to /api and the AWS CloudMap DNS resolver 169.254.169.253
so services can communicate via names like http://frontend:3000 and http://backend:5000
FastAPI Backend Image - (link)
FastAPI with only a root /
and /api
route setup for responses to frontend API calls.
Test Demo Locally
Clone the frontend project files
git clone git@github.com:ryanef/frontend-ecs-project.git
cd frontend-ecs-project
# if you want to test locally before buiding images
npm install
npm run dev
When dev server starts in the CLI you should see a link like http://localhost:5142
to open in your browser. If the React app loads, build the image and push to ECR.
Build Docker Images and Push to Elastic Container Registry
Make one ECR Repository for the frontend image and one for the backend image. Making a repository in AWS Console only takes a few seconds and they’ll give you exact commands to copy and paste to push.
In the example below my ECR Repository is named “reactdevtest” for the frontend and “backenddevtest” for the backend image
Docker or Docker Desktop must be running and AWS CLI must be installed. Use push commands from ECR console or modify these with your REGION and AWS Account Number:
aws ecr get-login-password --region REGION | docker login --username AWS --password-stdin ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com
docker build -t reactdevtest -f Dockerfile.prod .
docker tag reactdevtest:latest ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/reactdevtest:latest
docker push ACCOUNTNUMBER.dkr.ecr.REGION.amazonaws.com/reactdevtest:latest
Clone backend project files
Put these in an entirely different directory than the frontend files or your image sizes will be huge.
git clone git@github.com:ryanef/backend-ecs-project.git
cd backend-ecs-project
# if you want to test locally
# i recommend making a python virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --port 5000
If it works, follow the same ECR Push instructions as the frontend. If you’re using the same names as the guide then be sure to replace reactdevtest
with backenddevtest
in the docker build commands.
Create Fargate Infrastructure with Terraform
Clone the Terraform project
git clone http://github.com/ryanef/terraform-aws-fargate
cd terraform-aws-ecs
terraform init
terraform plan
variables.tf
and look for use_nat_gateway
or use_endpoints
and change the value to true for which one you want to use.
Settings to Change before Terraform Apply
Go to variables.tf
and update frontend_image
and backend_image
with your ECR Repository URIs. If using VPC Endpoints, the ECR Repository must be private. AWS does not support public repositories with ECR’s interface endpoints.
Enable NAT Gateway or VPC Endpoints by changing the default values of use_endpoints
and use_nat_gateway
to true. The variables for vpc_name and environment are used to name resources and tag them.
You will have to make sure Terraform has authentication to AWS. If you’re in an environment where you’ve setup AWS access already then you shouldn’t need to take any extra steps for this. AWS Provider Docs shows the steps Terraform takes to look for AWS credentials. Don’t hard code your AWS Access Keys into anything but you can go to providers.tf
and add shared_credentials_file
or the profile
which you can see in the AWS Provider docs.
Terraform Apply
After adjusting settings and adding your ECR image URIs you’re ready to apply. It’ll take a few minutes since it is creating about 50 resources.
terraform apply
When it is done, it’ll output the DNS address for the loadbalancer and you can visit that address in your browser. You may get a 503 error the first minute or so while the tasks finish launching. After the web application loads, try clicking “Profile” on the navigation menu and see if you get a message from the Python backend. If you see “error in profile” displayed on the page, it’s possible the backend container needs a few more seconds to setup.
Monitoring
VPC Flow Logs, ECS Service and ECS Tasks are all using CloudWatch Logs and can be found with names like vpcName-environment-*
. The Application Loadbalancer does not have logging enabled but if you wish to do that it will require making an S3 Bucket. X-Ray will be optional in a future version.
Change Networking Details
If you want to change the default VPC CIDR or add more subnets, go to variables.tf
and you’ll see a networking section at the bottom. Refer to the README.MD for a full list of possible subnets that can be used with the default 10.10.0.0/20
CIDR.
Add your own ECS Services / Task Definitions
Go to locals.tf
and you’ll see a locals block for the services and the default frontend
and backend
. To add more, simply add new blocks and change the values to your image link, container name, container port, etc. There’s also a locals block for target_groups
and you may need to change the port numbers to match your container ports.
Optimization Settings
Application Loadbalancer
Most of these can be changed in the locals.tf
file where the ECS Services and Target Group settings are. The file name is beside the variable names. You may want different settings for different services so keeping them flexible in these locals blocks seems best for now.
healthy_threshold
- locals.tf
HealthyThresholdCount: Number of consecutive passing health checks before a target is considered healthy. This is an ALB setting but ECS does check this as a consideration of container health.
Default: 5
Range: 2-10
unhealthy_threshold
- locals.tf
UnhealthyThresholdCount: Number of consecutive failed health checks before target is marked unhealthy.
Default: 2
Range: 2-10
interval
- locals.tf
HealthCheckIntervalSeconds: The time in seconds between each attempt at a health check.
Default: 30 seconds for ip and instance targets
Range: 5–300 seconds
timeout
- locals.tf
HealthCheckTimeoutSeconds: Time in seconds that no response from a target means the health check has failed.
Default: 5 seconds for ip or instance targets
Range: 2-120 seconds
deregistration_delay
- locals.tf
AWS Default: 300 seconds, Script Default: 60 seconds
Range: 0-3600 seconds
Running containers are “registered” with the Application Loadbalancer’s target groups
which track the IP address and health of targeted containers. When a target is deregistered, that probably means you stopped it on purpose or it errored and crashed. The ALB will stop sending traffic to that target, but there’s a concept of Keep-Alive
in HTTP traffic where the ALB will leave existing connections open for a period of time so users with any in-flight requests don’t get suddenly interrupted. ECS will wait on this deregistration_delay time before it forces the container process to be terminated. Most people can significantly lower this unless they have processes or users doing things like large file uploads or some other type of streaming connection.