Scaling Surlesol

Thursday, August 05, 2021

The question we have here is: "How do I scale Surlesol?"

What is Surlesol?

Surlesol is a web app for life-improvement that I have been working on for some time. Surlesol is designed to help people with recurring habits, tasks, and routines. It is written in Django and designed to be mobile friendly (although there is no mobile app at present). The setup we have is a pretty typically Django setup with an external PostgreSQL DB.

What does the System Design Look Like Right Now?

The design of Surlesol is pretty bare bones right now, although we do have the database on an external server:

Surlesol System Design Orig

What are the App Requirements?

Functional Requirements:

  • Users need to be able to add items.
  • Users need to be able to change the state of items.

Non-Functional Requirements:

  • Response time for user actions needs to be < 1s
  • Very database intensive (many writes, lots of reads)

How Are We Doing Right Now?

If we don't know how we are doing right now we are not going to have any idea whether or not we are improving. My first thoughts here go to Django Debug Toolbar and Development Tools in Firefox. When recording the performance of one event, marking an item as done, we POST to the URL:

Lots of Idle Time

The answer is, not great! A response time of over 2.7 seconds for a single user is bad. We should aim to get this down. But how?

Django-Debug-Toolbar (DTT)

In order to get DDT working with docker-compose I added the following to my settings.py (via SO):

if DEBUG:
    # `debug` is only True in templates if the vistor IP is in INTERNAL_IPS.
    INTERNAL_IPS = type(str('c'), (), {'__contains__': lambda *a: True})()

I added this after discovering that the IP address that docker-compose exposes changes on each run. Because of this the IP 127.0.0.1 for localhost does not work with DDT.

Upgrading PostgreSQL and PostgreSQL Server

Right now the PostgreSQL version is old (9.5) and the server running PostgreSQL is old (Ubuntu 16.04).

Because this is such a big project that should be done before releasing Surelsol into production, and could have a big impact on performance, I am going to write a separate troubleshooting post on this and come back with an update: https://stevenramey.com/2021/08/07/

Django Recommended Optimizations:

Because these recommendations are via the checklist docs, we can reasonably assume that these are all relatively low-hanging fruit:

  1. Consider using cached sessions to improve performance.
  2. If using database-backed sessions, regularly clear old sessions to avoid storing unnecessary data.
  3. Enabling persistent database connections (This helps a lot on virtualized hosts with limited network performance.)
  4. Enabling the cached template loader often improves performance drastically, as it avoids compiling each template every time it needs to be rendered. See the template loaders docs for more information.

Implementing Cached Sessions

So right now, to be very frank, I have no idea how to implement caching for sessions. The first thing that I have started to investigate is implementing caching on the entire site in development. Following the Django Caching Docs I implemented the recommended settings.

However, I am getting the error that the object(s) are too large for the cache. Now I am thinking that this error is occurring because I am trying to use DDT in conjunction with memcached. Indeed, after disabling DDT the objects fit in the cache. But now the website is being cached it is not working properly (as it is a somewhat dynamic web app).

Right then, so caching the entire site was obviously not working. So I've gone ahead and cached a bunch of stuff in surlesol/base.html via template tags. For example:

{% cache 5000 base01 %}
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="">
    <meta name="author" content="">

    <!-- ADD FAVICON -->
    <link href="{% static "css/bootstrap.min.css" %}" rel="stylesheet">
    <link href="{% static "css/datatables.min.css" %}" rel="stylesheet">
    <link href="{% static "css/app.css" %}" rel="stylesheet">
    <script src="{% static "js/jquery-3.3.1.min.js" %}"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/mo-js/0.288.2/mo.min.js"></script>
    <script src="{% static "js/mojs-player.min.js" %}"></script>
    <script src="{% static "js/popper.min.js" %}"></script>
    <script src="{% static "js/bootstrap.min.js" %}"></script>
    <script src="{% static "js/datatables.min.js" %}"></script>
    <script src="{% static "js/party_popper.js" %}"></script>
    <link href="https://fonts.googleapis.com/css?family=Mukta" rel="stylesheet">
{% endcache %}

What is the Current Data Usage?

Right now the data usage is extremely small. But how can we measure this? First thing I'll do is just log into DigitalOcean and take a look at some of the usage.

Daphne:

Daphne Data Usage

PostgreSQL:

PostgreSQL Data Usage

The first thing I see is that there seems to be a high correlation between the data usage on the Django/Daphne server and the PostgreSQL server. The second thing is that there seems to be an inverse relationship between the Bandwidth usage for our Daphne/Nginx server and the PostgreSQL server.

Caching

$ free -m
              total        used        free      shared  buff/cache   available
Mem:            981         320          87           1         572         493
Swap:             0           0           0

How Quick / Easy is it to Scale Up on Digital Ocean?

How Can I Improve Monitoring?

Right now my monitoring is quite limited to visiting the Digital Ocean dashboard and/or looking at data in CloudFlare. Every now and then I'll visit the servers directly to monitor some usage.

Do we need to shard the database?

How does the application fair under heavy load? How can we simulate this?

How does one shard a PostgreSQL DB with Django?

I have never sharded a database before. I think that is the sensible thing to do here since Surlesol writes a lot of data (rather than having read replicas which would not help us here).

Links:

How do Django Migrations work with a Sharded DB?

The main thing that I am concerned about here is the migrations with multiple databases. Let's search this:

Links:

  • https://www.digitalocean.com/community/tutorials/how-to-scale-django-beyond-the-basics
  • https://pgdash.io/blog/postgres-11-sharding.html
  • https://stackoverflow.com/questions/24699935/setting-up-memcached-for-django-session-caching-on-app-engine
  • https://docs.djangoproject.com/en/3.2/topics/cache/