Thursday, August 05, 2021
The question we have here is: "How do I scale Surlesol?"
Surlesol is a web app for life-improvement that I have been working on for some time. Surlesol is designed to help people with recurring habits, tasks, and routines. It is written in Django and designed to be mobile friendly (although there is no mobile app at present). The setup we have is a pretty typically Django setup with an external PostgreSQL DB.
The design of Surlesol is pretty bare bones right now, although we do have the database on an external server:
Functional Requirements:
Non-Functional Requirements:
If we don't know how we are doing right now we are not going to have any idea whether or not we are improving. My first thoughts here go to Django Debug Toolbar and Development Tools in Firefox. When recording the performance of one event, marking an item as done, we POST to the URL:
The answer is, not great! A response time of over 2.7 seconds for a single user is bad. We should aim to get this down. But how?
In order to get DDT working with docker-compose I added the following to my settings.py (via SO):
if DEBUG:
# `debug` is only True in templates if the vistor IP is in INTERNAL_IPS.
INTERNAL_IPS = type(str('c'), (), {'__contains__': lambda *a: True})()
I added this after discovering that the IP address that docker-compose exposes changes on each run. Because of this the IP 127.0.0.1 for localhost does not work with DDT.
Right now the PostgreSQL version is old (9.5) and the server running PostgreSQL is old (Ubuntu 16.04).
Because this is such a big project that should be done before releasing Surelsol into production, and could have a big impact on performance, I am going to write a separate troubleshooting post on this and come back with an update: https://stevenramey.com/2021/08/07/
Because these recommendations are via the checklist docs, we can reasonably assume that these are all relatively low-hanging fruit:
So right now, to be very frank, I have no idea how to implement caching for sessions. The first thing that I have started to investigate is implementing caching on the entire site in development. Following the Django Caching Docs I implemented the recommended settings.
However, I am getting the error that the object(s) are too large for the cache. Now I am thinking that this error is occurring because I am trying to use DDT in conjunction with memcached. Indeed, after disabling DDT the objects fit in the cache. But now the website is being cached it is not working properly (as it is a somewhat dynamic web app).
Right then, so caching the entire site was obviously not working. So I've gone ahead and cached a bunch of stuff in surlesol/base.html
via template tags. For example:
{% cache 5000 base01 %}
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<!-- ADD FAVICON -->
<link href="{% static "css/bootstrap.min.css" %}" rel="stylesheet">
<link href="{% static "css/datatables.min.css" %}" rel="stylesheet">
<link href="{% static "css/app.css" %}" rel="stylesheet">
<script src="{% static "js/jquery-3.3.1.min.js" %}"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mo-js/0.288.2/mo.min.js"></script>
<script src="{% static "js/mojs-player.min.js" %}"></script>
<script src="{% static "js/popper.min.js" %}"></script>
<script src="{% static "js/bootstrap.min.js" %}"></script>
<script src="{% static "js/datatables.min.js" %}"></script>
<script src="{% static "js/party_popper.js" %}"></script>
<link href="https://fonts.googleapis.com/css?family=Mukta" rel="stylesheet">
{% endcache %}
Right now the data usage is extremely small. But how can we measure this? First thing I'll do is just log into DigitalOcean and take a look at some of the usage.
Daphne:
PostgreSQL:
The first thing I see is that there seems to be a high correlation between the data usage on the Django/Daphne server and the PostgreSQL server. The second thing is that there seems to be an inverse relationship between the Bandwidth usage for our Daphne/Nginx server and the PostgreSQL server.
$ free -m
total used free shared buff/cache available
Mem: 981 320 87 1 572 493
Swap: 0 0 0
Right now my monitoring is quite limited to visiting the Digital Ocean dashboard and/or looking at data in CloudFlare. Every now and then I'll visit the servers directly to monitor some usage.
How does the application fair under heavy load? How can we simulate this?
I have never sharded a database before. I think that is the sensible thing to do here since Surlesol writes a lot of data (rather than having read replicas which would not help us here).
Links:
The main thing that I am concerned about here is the migrations with multiple databases. Let's search this: