Advertising vs Premium Features


The initial plan was to pay for our servers with the help of Supporters, and in return let Supporters explore openings in greater depth. We had no idea if this would work so set ourselves the target of being financially stable within 30 days when our Azure free trial would run out.

Unfortunately, while we were able to draw a large crowd to our website with top posts on reddit and Hackernews, getting people to pay money for a new (and honestly not essential) service was another matter. It became pretty clear as the 30-day deadline loomed we would be unable to meet our target.

We have decided to pivot our financial model to a donation, and possibly advertising based approach. We found that whilst the Janusgraph server was an interesting part of the project, it was unsustainable purely on ad revenue. While it was unfortunate to shut it down, we found we could increase the size of the free dataset using CloudFlare’s free CDN to keep costs low for us. We also added the free option to download a much larger version of each dataset.

With our costs much lower, the website should be sustainable in the long term, and we can look into adding new features to attract users. Next we have plans to collect datasets from specific GMs, like Magnus Carlsen, these are much smaller datasets so we should be able to show them in their entirety for free to everyone. We are also updating some of our UI based on feedback from reddit and Hackernews.

Social Media


Now we had a functioning website we needed to get the news out! We wanted a slow start not to overwhelm the server, but we found the internet very all or nothing. After posting it to some smaller forums and getting very little attention, we posted it to Hackernews and made the front page!

This resulted in a comparative explosion of traffic, with over 20,000 people visiting the site. We also found ourselves getting linked to from around the internet. We were surprised by how many bots are set up to share the front page of Hackernews! We also managed to get a top post on a chess subreddit, resulting in another wave of traffic.

Currently we have no ads, so our website is only sustainable with the help of supporters. We had hoped to have enough supporters to get the website off the ground within the 30-day trial, however, this proved to be unsuccessful and a lot more difficult than we thought. The next post explains some of our thoughts about this.

Gremlin Development


We wanted to build a server that when requested could return a large amount of data from a particular state. This isn’t something either of us had much experience with, and finding the right tools to store and share our database was quite challenging. Initially, we looked at using a SQL database but found there were challenges with rapidly returning our heavily linked data, particularly surrounding transpositions. This turned into something of a running theme with transpositions repeatedly causing unexpected problems. Eventually, we came across graph databases which are perfect for storing the data behind our graph!

We wanted to use the Gremlin query language to return data, as it seemed well suited to our needs. We were first drawn to using Azure Cosmos DB, Microsoft’s cloud graph database, by their generous free trial. However, we found Cosmos ill-fitted for our needs, in particular, they only allow for a subset of the Gremlin language, seemingly intentionally cutting off its computational power. This made returning only data above a threshold of games very challenging.

Ultimately we decided to set up a Janusgraph server on a VM. We primarily did this to unlock all of Gremlin’s features, but also found this was cheaper and faster than Cosmos. Setting up the JanusGraph server wasn’t the easiest but gave us a lot more control to optimize for our use case. For instance graph databases are often distributed across multiple PCs, however, with Janusgraph we could select a backend (Berkeley DB) optimized for use on a single VM.

Now with our server up and running, we had to find a way to pay for it! We launched Janusgraph on an Azure VM with a 30-day free trial to get the website off the ground, after that we would need supporters to help pay for it. But first we needed an audience which I will talk about in the next blog.

PGN Processing


If we wanted to process the massive 800 million game Lichess database we would need an efficient algorithm.

In order to reduce IO load, we batched games into groups of around 100,000 games. Within each batch the games are assigned a zobrist hash based on their current state. Each game is advanced by one move, and the new state calculated, we then group together games which have an identical state, e.g all games starting in the initial state and diverging as the game progressed. Here is a sketch of the initial plan for the algorithm:

As we advanced each state we would record the statistics we needed as state data. From the PGNs we could record win/loss rates and the Elo of the players involved for each state we processed. This produced a massive amount of data so we only processed states that had more games than a given global threshold.

Ultimately, we ended up with 60GB of data, far too much to send someone when they load a page! To let them explore openings in that kind of depth we needed some kind of server that would only send the data needed to further explore the tree, I will talk about our process for doing this in the next blog.

ChessRoots Launch!


Welcome to ChessRoots- a website to visualize massive chess databases. You can use ChessRoots to explore the openings of several databases on an interactive graph. Use the tools at the top of the page to change the dataset or filter it by Elo or time control.

A few months ago I was playing around with processing chess PGNs out of personal interest. After teaming up with a friend who is better at web development, we wanted to find a way to make an interactive opening graph on a website. We came across the massive 800 million game Lichess database and set our ambitions on processing the entire database and using the data to make an interactive graph. The size of the dataset presented several technical challenges that I am going to talk about in this blog, I will start by talking about the algorithm we used to process PGNs in the next post.