Up until now, we talked about where to save the “session state” – on the server (part 1) or on the client (part 2). The session state contains the basic user profile information: their first and last names, email, id, username etc. In other words – their identity. The session state usually also contains user’s permissions in the form of roles or scopes they have access to. This can be later used by the site in order to authenticate the user request.
When we keep the session state in the client, we can use JWT tokens in order store those two things. This can allow us to authorize the user request without needing to lookup permission in memory or external system. But the user identity and his permissions are not all the data which we may need to store.
In Part 1 we learned that sessions were created as a result of HTTP being a stateless protocol, and as such, there is no way to associate between different requests to the users. Sessions solved that issue by matching between the session id cookie and the user data which was stored in the server memory.
However, over time developers began using the server session for other purposes as well. In many cases, besides the user identity and permissions, server-side session object also contained:
- Cached data: Data cached from the database or other systems
- Temporary user submitted data: User submitted data that is temporarily stored in memory before it is committed to the DB (for example shopping cart or drafts)
- Temporary site state: Developers often used the session to store technical data required for generating and maintaining the user’s site state
Storing this kind of data inside JWT token is not recommended:
- first, since JWT needs to be sent inside the request they are limited in size (between 4kb and 8kb). Even if size limit is not reached, it is still not advised – it can cause performance issues since they are submitted with every request to the server.
- second, every time the data inside the token changes, a new access token has to be created. This can cause many security problems and it is definitely not how tokens are supposed to be used.
- lastly, since the data sits inside the browser it can not contain sensitive information (remember, by default JWT tokens are not encrypted, they can be inspected by everyone).
So where can we store this kind of data?
Well, it depends. In general, we can use one (or both) options:
- Browser Storage
- High-Availability NoSQL Databases
The best option changes according to every site requirements, and figuring out what is the best approach requires a lot of thinking and experience. However, I can offer some points to consider while choosing our site architecture.
Caching is a technique which site developers often use in order to increase site performance. For example, when the user logs in to the site, we request from the database (in addition to their profile) extra information about different assets they have in their account (for example, the purchase history, banking account, etc). We then store that information inside the user session in order to improve performance for future requests, by saving us the need to query the database.
This essentially means we use the session as a caching mechanism. But the problem is that user session is not a good caching mechanism:
- first, if we use the site process memory to store a replica of DB objects for all logged-in users, it can greatly increase its memory consumption and cause it to reach its limit and crash.
- second, we can run into concurrency issues. For example, what happens if another process (another front, job etc) has modified the data inside the database? If this case is not handled (which is not easy to do), we are left with “stale” data which can lead to data loss and corruption.
- finally, if the site relies on database-based sessions, then it’s really not an efficient solution – we retrieve data from one database in order to cache it in memory but then we cache it inside another database…
The bottom line is that sessions state objects were not designed to be used as a database caching mechanism – they were designed for different scenarios, as explained in here. If we must cache data/processing results, it is better to use dedicated products which were created for that purpose (like Memcached or Redis).
But do we really need caching at all?
The primary reason we used caching, is that working directly against the database caused major performance issues. But is it still the case?
In the past, in order to persist user data we usually used a relation-based database. But those types of database come with a major disadvantage: they were not designed for object-oriented programming languages, a problem known as Object-relational Impedance Mismatch . This means the retrieving large objects, like the entire user profile or account data, is an expensive operation which needed to be cached. You can read more about this problem (and others) in this series, explaining why relational databases are not suitable for modern sites and applications.
But in the recent years, there was a huge rise in NoSQL databases. These kinds of databases are based on key/value store, which makes it possible to retrieve all of the user data in a single request. In addition, they also have a built-in caching layer, making dedicated caching solution unnecessary. And as a bonus, many of them are also distributed meaning they can span across many servers in order to deliver high-performance, high-availability operation.
In many cases, when our user data sits inside high-availability NoSQL database like Couchbase, Cassandra, Azure Cosmos DB, or even Elasticsearch, there is no need for a dedicated caching layer – we can just work directly against the database.
- Should I use Memcache with a NoSQL database?
- Looking for a better, less costly caching solution?
- NoSQL in the Enterprise
- High‐Speed Data Caching with NoSQL
- Using Azure Cosmos DB as your persistent, geo-replicated, distributed cache for ASP.NET Core
High-Availability NoSQL Databases can also be used in order to store server-side sessions, if we choose to go in that direction, for example Riak KV and Elasticsearch. In many cases, they are still a better alternative than the built-in platform session mechanisms for high-scalability sites.
Temporary user submitted data
Another use for sessions is to maintain temporary user submitted data. The most common example is the shopping cart, which is only committed to the database when the user completes the transaction. Another example is saving multi-step forms data as “drafts”.
This was acceptable in the past, but as a result of the last 20 Years of Websites Evolution (which you can read all about in this series), this is no longer the case – all user submitted data always needs to be committed to the database.
There are two main reasons for that: in order to support multiple sessions and to prevent user data lost.
Support multiple sessions (session synchronization)
These days users expect websites to remember all the information they submit, and to be available even when using the site from a different browser, computer, or mobile device. For example, when using Netflix you can start watching a movie in your desktop Chrome browser, switch to your laptop Firefox, and finish the movie on your mobile device – all without losing the exact place you left. With Gmail, you can start writing an email on one computer and finish it on another. In other words, all of your information, including “drafts”, is always synchronized across all devices and sessions.
Achieving this level of real-time synchronization across multiple sessions is very difficult to do if we try to use server-side session. And the reason for this is that sessions based on Session ID are not shared across different browsers or devices, even if it’s for the same user who is logged in. Because we are talking about different sessions, they do not share the same memory space, so when user’s data is changed in one session it will not affect their other sessions – leading to serious synchronization issues. (and this is assuming the same technology is used to handle both web and mobile traffic – which is often not the case making session synchronization impossible). The only way we can allow users to simultaneously work from multiple browsers and devices, while always having the latest version of their data, is by committing it to a database.
The only way we can allow users to simultaneously work from multiple browsers and devices, while always having the latest version of their data, is by committing it to a database.
Prevent user data lost (stability)
The second reason regards stability – what will happens if the server is rebooted or the site crashes? Or the session management mechanism fails and starts a new session? All of the user data that was stored in the memory will be lost. Because of that we should be always committing user data into our database – this is the only way to make sure we don’t lose our customer data. This way we avoid stability issues resulting from session loss, guarantee consistency for all user interactions and avoid losing their data. If it’s important enough to be sent to the server, it’s important enough to be persisted.
If it’s important enough to be sent to the server, it’s important enough to be persisted.
Exception – anonymous user submitted data
There are some cases when we need to store user information which users do not expect to be saved on the server. This primary example of this is with an anonymous site where users are not logged in but still generate data (for example shopping cart before checkout, dismissing disclaimers, or even working on drafts). For these cases, it is recommended to use the browser Web Storage which all modern browsers have and can store large data (starts with 5MB).
Temporary site state
There was another reason for using session, a technical reason.
For example keeping track on user activities in the site, passing temporary state information between requests, or technical information which the server used in order to generate the page. That was common in older web development platform which used to generate the site HTML on the server side. However, with the move to client-side HTML rendering technologies (for example Angular, React and Vue.Js), maintaining server-side state for each user is no longer necessary – instead, it can be stored inside the browser memory or storage (this is especially true for SPA).
One thing we need to remember when using browser storage is that since it sits inside the client, it is less secured than it was on the server. If we need to save sensitive information inside the browser, it is recommended to consider the following options:
- Use session store instead of the local store – this will not guarantee that the data will be wiped when the session expires but will help.
- Clear the stored data when the user presses the logout button or the session expired.
- When reading data from local store make sure it has not expired by comparing to date or checking for the session cookie
- Encrypt the data before storing it (only useful when we don’t need to read the data in the client since we don’t have a secure way of storing the encryption key in the browser).
- Don’t use client store at all – instead just keep everything in the browser memory and request it again when the page is reloaded (suitable for SPA backed by a high-availability database).
Stateful & Stateless Sessions
If we manage to completely skip the server-side session, we can reach our goal, which is having a stateless session.
A stateless session is not actually stateless – they do maintain a user state, but it’s not the server, which makes the request “stateless” from the server point of view. Stateless sessions are the holy-grail of scalable architecture, allowing our site to always allocate the right amount of resources to handle any traffic.
Read more about stateless:
– 100% Stateless with JWT (JSON Web Token) by Hubert Sablonnière
– Stateful & Stateless Identity – Intro to Identity Series
Client-based sessions can help to increase site stability, performance, and scalability, but they do come at a cost. It is important to think about their pros and cons before deciding on the site architecture. No one solution is perfect for all and there are always trade-offs – all we can do is to pick what is the best option for us.
This series focused on the session management aspect of the membership system. To learn more about the other parts – authentication, authorization and SSO – including related protocols like OAuth, OpenID Connect and SAML – it is recommended to read my other article in the subject: Fifty Shades of Membership Sites: Mixing Authentication, Authorization, Session Management and Single Sign-On.