Secure Websockets on a Container with a Load Balancer and SSL Termination
A lot of AWS products use acronyms - AWS is even an acronym itself! I added references at the end of this article to help you clarify what they refer to.
If you need to set up real-time communications between your server and your clients, chances are you considered using WebSockets. Or, like in my case, you might not have a choice: the solution you have to deploy already relies on them, like Apollo GraphQL Subscriptions for instance.
I recently had to set up an Apollo Server running in a container in Amazon ECS, and I had a couple of issues when setting up the WebSocket connection. I hope this article will save you some time if you need to deploy something similar.
Once you containerize your application, publish it on Amazon ECR and run it on Amazon ECS, your app is running yes, but you can only access it using the container public IP if you set one up.
You could create a DNS record to use your domain, but Amazon ECS containers don't have static IPs so you have the risk of the IP changing and having some downtime on your service while you update your DNS zone.
Maybe you'll need to scale horizontally, but using multiple DNS entries pointing to the different containers won't help correctly balance the traffic between them.
Enter ALB. The Amazon Application Load Balancer coupled with Amazon ECS allows you to forward your traffic to your containers by leveraging multiple resources:
The Load Balancer itself
A Target Group
The Target Group defines where to send the traffic, in our case here it's going to be our Amazon ECS Service. We can configure the port and protocol of the traffic that is forwarded, as well as a health check, so that the load balancer doesn't send traffic to a dead service.
The Listener defines how the load balancer gets its traffic from outside. That's where you define the port and the protocol to reach the load balancer, and what is the default behavior for the traffic hitting that listener. You can set up multiple listeners. For instance, you create an HTTP listener on port 80 that will simply redirect the client to HTTPS. Then your HTTPS listener on port 443 would forward the traffic to your service.
Those Listeners connect the load balancer and the target groups. In that way, you could have a load balancer sending the traffic from port 8080 to another service, and sending the traffic from port 4000 to another service.
Now, the bonus part is that your Amazon ECS service can automatically register itself with the load balancer. That means that the routing to the correct container IP is completely abstracted, and we don't need to update it manually at all.
Amazon ALB Listeners only offer HTTP or HTTPS protocol, but the good news is that WebSocket initially contacts the server with HTTP if you use ws:// or HTTPS if you use wss://.
Your server will then reply with 101 Switching Protocols, telling the client to upgrade to a WebSocket connection.
The difference with a regular HTTP connection is that the WebSocket connection is meant to stay open.
To support those long-lived connections, you have to increase the default Idle Connection Timeout setting from 60s to the value of your choice.
This Idle Timeout will only happen when nothing happens between your server and client, and that's the delay before the load balancer will drop the connection, which in turn will cause your client to emit a 1006 Error.
I strongly suggest that you implement a reconnection logic in case that happens, but as long as data is exchanged, the load balancer won't drop the connection.
Any connection is better when secured and the Amazon Application Load Balancer allows you to easily set up SSL Termination for your services.
If you use Amazon ACM to generate SSL Certificates for your domains (it's free), you can pass your certificate ARN to your HTTPS Listener on the load balancer. The only thing left to do is to create a DNS record to point to your load balancer, and you'll serve your service over HTTPS, with a valid certificate.
WebSocket connections are initialized over HTTP or HTTPS.
You need to create a load balancer, a target group pointing to your Amazon ECS Service, and listener rules to accept traffic and forward it to your target group.
You need to increase the Idle Timeout setting on the load balancer in order not to drop connections if they don't exchange data for 60 seconds.
You can link a SSL Certificate to your load balancer listener so that client applications don't complain about the certificate not being valid.
If you use SSL, you need to use wss:// instead of ws:// for your websocket endpoint.
AWS: Just for the sake of it, but it stands for Amazon Web Services
ALB: Amazon Application Load Balancer - it’s a flavor of Load Balancer operating on the Layer 7
ECS: Amazon Elastic Container Service - it’s a solution to run containers, either on VMs or completely managed
ECR: Amazon Elastic Container Registry - it’s a private registry for your container images
ACM: AWS Certificate Manager - AWS solution to create SSL certificates for your domains
ARN: Amazon Resource Name - AWS Unique Identifier for their resources
This article refers to the work that Tech Holding has done for BuildOps to help them migrate from Amazon AppSync to an Apollo Server in order to prepare for the creation of GraphQL microservices using Apollo Federation.