Open source is great. I remember one of my earlier encounters with programming I was awed by what the language could allow me to achieve. It was PHP and it was open-sourced. I later got introduced to Linux by a friend of mine, goosebumps. Since then, I’ve been an open-source-loving programmer and I’m very grateful for its concept and implementation. I could go on giving list after list of open-source tools that I’m thankful for this year. From the programming languages to the frameworks, libraries, and third-party applications. It will be an endless list. In this post, I’m going to dedicate the following few paragraphs to talking about one open-source tool I have happily used this year and possibly give examples of how I used those.
As you rightly guessed, the tool I’m going to talk about is Redis. You can check out its GitHub repository here https://github.com/redis/redis. They have an excellent definition for what it does, but for me, I sum it up as think fast, think redis. It is a database in memory, which is part of why it’s fast. It is mostly used for caching and queue implementation. When building an application that stores data like a web API, you will primarily use a disk-based database, SQL / NoSQL, to persist the data. Of course, there is a considerable difference in the speed performance of a disk-based DB vs. a memory-based DB like redis. However, memory costs so much more than disk, so unless you need that extra speed, you might have yet to think of redis.
I will elaborate on a few scenarios that have required me to use redis in my architecture designs this year. Before that, I did like redis because it’s open-sourced and does exactly what it says it will do. When designing a system, you think of the properties you want; for example, you might think you need high-speed storage for both read & write, and you usually don’t go and ask your team to build these things. You look for existing solutions and use them, whether proprietary or open-sourced. When you settle on a solution, it means you recognize that it will be able to fulfill your requirements; even if it’s open-source, there is this implicit agreement. Redis does this beautifully if you’re looking for fast and reliable open-source memory storage.
I will now go on to scratch the surface of how I’ve used redis in my system designs this year. One of my roles at Ejara is working as the Principal Blockchain Engineer. Within our mobile app, we offer a non-custodial wallet to our users to enable them to buy and sell crypto at their will. When building a crypto application, the following features are important to have for a good user experience.
- Show the transaction history of the user’s account.
- Trigger notification to users when an event happens on their account.
These two may seem like simple features from the user’s point of view, but unfortunately, blockchains weren’t designed to handle these features. In a typical blockchain architecture, there are nodes that interact with each other to broadcast transactions and create new blocks. So whenever you hear of a blockchain, you can be sure they have node software that willing operators run to secure the blockchain. Whenever a transaction is submitted, it goes to a node that also broadcasts it to other nodes. After a specified period, transactions are collected into a block, validated by special nodes called validators or miners, and then appended to the chain of blocks. In the case of Ethereum, one popular node software is Geth.
Since transactions are bundled up in blocks, it means that to get the transaction history for a single account, you will need to loop through a bunch of blocks, and then for each block loop through the transactions and pick out the ones belonging to the account in question. Yup, that’s quite impractical; users will have to wait for a while to see their transaction history. This is why indexers were invented. An indexer listens on the blockchain network for newly published blocks, extracts the transactions, and stores them in such as way as to make retrieval of transaction history trivial. For example, if the indexer uses a SQL database, it could store transactions against account identifiers, hence getting transaction history is now reduced to a simple select where query. For example, the near blockchain has a publicly available PostgreSQL DB here https://github.com/near/near-indexer-for-explorer which contains indexed transactions. You can now see that triggering a notification to users is trivial since, while processing the transactions in a block, the indexer could also trigger notifications to the accounts.
At Ejara, we decided to build an indexer for a couple of reasons we don’t need to go over here. Here are some of the functional requirements;
- It should support multiple blockchains.
- It should only index transactions for the accounts created on our platform.
- It should be able to trigger account events in our notification service.
Now, these are just a few of the functional requirements but should be enough to illustrate the use of redis. The indexer runs basically like this;
- Query the blockchain for the latest block.
- Check the database for the last synced block.
- Compute which blocks need to be synced by the indexer and query the blockchain for the block data.
- Process the block and store the transactions related to the tracked accounts in a Postgres database.
The first use of redis here is for queue implementation since a queue is used to handle the whole process. Redis handles this beautifully with no issues so far. There’ve been instances where the queues had millions of items to process and that wasn’t an issue, the thing is just reliable. This usually happens when the blockchain node the indexer is tracking falls behind or stopped serving requests for some time before coming back. Usually, the bottleneck is the network speed between the indexer process and the blockchain nodes, but then in a reasonable time, things usually catch up and the queues clear up.
The second use of redis concerns how the indexer can process transactions related to accounts created on the app. The blockchain is like the multiverse, there are so many possible accounts in there and those created on the Ejara platform certainly make up a very tiny percentage, like humans on earth. It wouldn’t be optimal to index transactions for all the accounts on the blockchain, talk of the cost of the database, some blockchains like Tron have already hit 10TB, etc ….
The process would be something like this; for each transaction, check if it’s related to an account we want to follow and if yes then store it in the database. Since we have the addresses of our users in our database, the first thought would be to make the check from the database. However, one quickly finds it very slow even with very high database specs. To put things into perspective, combining all the blockchains that we support on the app, we have approximately 10k transactions per second that translate to about 20k+ database reads per second just to check if the address is present in our database (each transaction has a source and target address). The database wasn’t able to keep up, queue jobs were stuck waiting for the database, and simply trying to tune your database server to meet high read speed is not optimal design. Yup, you guessed the solution right, just cache the addresses in redis and use that to check. Redis handles this beautifully as if nothing is happening, for me, this is when I fell in love with redis. This approach was much more optimized and scalable in terms of cost and speed. Also, since each blockchain address is a few bytes, it will actually take about a million addresses to use 1GB of memory.
If you haven’t tried it out yet, start incorporating it in your system design problems, caching, queuing, etc … Redis is actually known as a data structure server so you can store any data structure in it. It’s honest software, honestly.