Scale A Stateful Streamlit Chatbot with AWS ECS and EFS

You built a great Streamlit application. Everything is working well locally when your boss asked:

Your PoC GenAI app is great. Let's make it available to the entire company!

You deployed it into a virtual machine (maybe using an address like http://12.34.56.78:8501) and your colleagues rushed in with enthusiasm, the server got overloaded and restarted repeatedly. Some of their precious usage data was lost forever. They lost their work. You tried to duct tape fix it by doubling the size of the VM (repeatedly), but the same overload + restart pattern repeated. You feel helpless.

Introduction

Streamlit is undoubtedly one of the greatest frameworks for Python developers to build interactive web apps. With the surge of interest in Generative AI since late 2022, the package has seen a significant rise in popularity, as evidenced by its growing number of GitHub stargazers:

More importantly, using Streamlit means you don't need to worry about learning the frontend side of things. You can focus on building the core functionality of your app, while Streamlit takes care of the rest. However, when it comes to deploying Streamlit apps to the cloud, things can get a bit tricky.

Statefulness and Scalability with Streamlit

When you transition your Streamlit app from a local environment to the cloud, two critical challenges arise: ensuring statefulness and achieving scalability. By default, Streamlit maintains state in-memory, which means that any state is lost when the user refreshes the page or the server restarts. This can be a significant hurdle when scaling the application across multiple instances, or when answering your boss that "why don't you just share your local PoC GenAI application to other colleagues?".

I saw the Streamlit community has been discussing this multiple times (here, here, here and here), but so far I couldn't find any comprehensive guide on how to deploy a stateful Streamlit application to the cloud. This article aims to fill that gap by providing a detailed guide on deploying a scalable and stateful Streamlit chatbot on AWS.

Architecture

To overcome these challenges, this article introduces a scalable and stateful Streamlit chatbot deployed on AWS. The architecture leverages:

Application Load Balancer (ALB): Distributes incoming application traffic across multiple targets, ensuring even load distribution.
Elastic Container Service (ECS) on Fargate: Manages Docker containers, allowing for easy scaling without the need to manage servers. Using arm64 and 0.25vCPU/0.5GB RAM ECS Tasks for extra cost-performance efficiency.
Elastic File System (EFS): Provides a scalable file system that can be mounted to multiple ECS nodes, ensuring data persistence and redundancy across Availability Zones (AZs).
CloudFront (optional): Acts as a Content Delivery Network (CDN) to improve performance and reduce latency for users - more importantly HTTPS.

Not Lambda?

I did consider using Lambda on either a Python or a Docker execution environment. However, Streamlit requires the websocket resource /_stcore/stream to be used. While services like Amazon API Gateway does support WebSocket, such an implementation requires you to define multiple Lambda handlers for websocket on connect/data/disconnect events, differing from the usual practices of e.g. using a Lambda Web Adaptor for the case of acting as a traditional HTTP endpoint.

More importantly, Streamlit's client frontend sends data to the websocket API as binary frames, while Amazon API Gateway only supports text frames. This makes it impossible to use Lambda as a Streamlit backend.

Why Pick EFS But Not Others?

While databases like RDS or DynamoDB, caching solutions like ElasticCache, and storage services like S3 are common choices for state management, they come with their own set of complexities and costs.

Option	Pros	Cons
RDS	Reliable, robust	Complex setup, high cost, item limits
DynamoDB	Scalable, fast	Complex setup, high cost, item limits, manual binary serialization
ElasticCache	Efficient caching	Complex setup, state loss on restarts, requires tunnel for local
S3	Cost effective	Network latency for get/set operations unless you pay for S3 VPC Gateway Endpoint
EFS	Easy setup, scalable, persistent, cost effective	Cost/Latency starts to become an issue when scaled to enterprise level

EFS offers a simpler and more cost-effective solution. It provides a network file system that can be mounted to multiple ECS nodes, ensuring redundancy across AZs and scalability. It is easy to set up and relatively cheap, making it an ideal choice for persistent state storage. More importantly as only file operations are involved, the difference in the local and cloud setups are minimal.

You Mentioned Cost But Load Balancers Are Not Free

It's true that the Application Load Balancer (ALB) incurs a fixed cost. However, the benefits of using ALB — such as automatic distribution of incoming application traffic, support for HTTP/2, and integration with AWS services—outweigh the cost. Additionally, the scalability and reliability it provides are crucial for a production-ready application. The cost of ALB is justified by the enhanced performance and manageability it offers.

Why This Approach?

Deploying a Streamlit app to the cloud requires careful consideration of both scalability and statefulness. By default, Streamlit’s in-memory state management is insufficient for a production environment for multiple users. Simply scaling up a virtual machine isn’t enough; you need a solution that persists user sessions across refreshes and server restarts.

The solution involves using the user's browser to store a session key in local storage with the streamlit-local-storage package, while each session is saved into a folder in the mounted EFS storage whose path is constructed with the session key, as Local storage isn't meant to be used to store too much binary data. This ensures that session data is persistent and synced across multiple ECS nodes.

Instead of opting for complex and costly database solutions like RDS or DynamoDB, or dealing with the intricacies of ElasticCache, EFS provides a straightforward and efficient alternative. It allows for easy setup, scalability, and cost-effectiveness, making it the ideal choice for this deployment.

The most important beauty of this approach is that your code will work the same across the cloud environment and your local one - as the data persistency part is just simple file read/write operations. No hustle of setting up a local database.

A Project Template for Scalable Stateful Streamlit App

It's a LLM chatbot based on Amazon Bedrock, offers basic model switching and conversation reset. On the left panel, you can see the hostname of the ECS Task serving your latest Streamlit run, as well as your session ID.

Referring to the screenshot below of the app you would see if you follow my instructions in the last section:

I first started the conversation on the left window, started a new window (with the session key persisted in local storage), managed by retrieve back my conversation while the Streamlit run was made by another ECS Task as suggested by the hostname on the left panel.

Pseudo Code of the Streamlit Python Script

Here’s a simplified pseudo code of the Python script used in the Streamlit application to manage session data:

import uuid
import pickle
import streamlit as st
from streamlit_local_storage import LocalStorage
... other imports ...

session_data = {}
session_id = local_storage.getItem('session_id') or str(uuid.uuid4())
if session_id:
    with open(f'/session_data/{session_id}.json', 'r') as f:
        for key, value in pickle.load(f):
            session_data[key] = value

session_data['some-key'] = st.some_input(label='Enter some input here')
... main chatbot logic here ...

with open(f'/session_data/{session_id}.json', 'w') as f:
    pickle.dump(session_data, f)

In this script:

A session_data local singleton dictionary is used to store session data.
A session_id is generated or retrieved from local storage.
The session data is loaded from the file system based on the session_id during script initialization, while the session data is saved back to the file system at the end of the script.
As EFS is mounted to all ECS nodes, the session data is shared across all ECS Task instances, surviving across scaling activities even when a separate ECS Task is used to serve your existing Streamlit session.

Streamlit's native session_state isn't used in my approach, as it is in-memory and not shared across multiple ECS nodes. Under a auto scaling environment, it is possible for every ECS Task to have served each user session at some point, making the session data available in-memory could lead to data inconsistency, as well as memory exhaustion. The current approach only requires the session key to be stored in-memory, which is a negligible amount of data.

Deploy It For Your Organization

To deploy this scalable and stateful Streamlit chatbot on AWS, follow these steps:

Clone my Repository: Start by cloning my repository: https://github.com/gabrielkoo/scalable-stateful-streamlit-chatbot-on-aws/tree/main to your local machine.
Deploy the CloudFormation Stack: Use the template.yml file to deploy the necessary AWS infrastructure. It’s recommended to use the AWS Management Console for an intuitive setup.
Build and Deploy the Docker Image: Run ./deployment.sh to:

build the Docker image
push it to Amazon Elastic Container Registry (ECR)
scales up ECS service with the new image

Access the Chatbot: Once deployed, access the chatbot via the URL provided by the Application Load Balancer (ALB) or through the CloudFront URL for added performance and HTTPS support.
Enable Auto Scaling: To truly leverage the scalability of this setup, configure Auto Scaling for the ECS service. This step ensures that your application can handle varying loads efficiently.

Note: My repo did not cover step #5.

By following these steps, you can deploy a robust, scalable, and stateful Streamlit application on AWS, ensuring a seamless user experience even under heavy load. This approach also provides a cost-effective and efficient solution for deploying Streamlit applications in a production environment, as compared to the "just double your virtual machine" brainless method.

More importantly, you can focus on building great GenAI applications with Streamlit locally and just scale it without a headache on AWS!

Author Of article : Gabriel Koo Read full article