Database Improvements

For the last few months, you may have noticed a dip in our releases and new feature posts. While a lot of that may include some real life for both Drakkhis and I, one of the other major reasons was us troubleshooting new database issues that we were experiencing for SL Colonies.

This started happening when we made updates to all of our items to enable everything to track on our database to work with the new questing system. These were unknown and undesired consequences which we had to troubleshoot for quite some time in order to come up with a solution that actually works, and sticks.

We use state-of-the-art servers for SL Colonies and invest a decent amount of money into monthly costs for servers that give you good response times within Second Life and work efficiently without skimping on quality.

Our main website (https://slcolonies.com), our player account website/meter website (https://meter.slcolonies.com) and even our forums (https://forum.slcolonies.com) are on three separate servers, spreading out server loads to offer you the best product we can.

We then have a fourth server, which is specifically our database. The database brings everything together, connecting our websites and our forums, as well as all the items within Second Life together.

Databases are not only limited by resources such as CPU, hard drive space and RAM, but also by the amount of connections permitted per second sending queries.

How it works for SL Colonies is that, when you click on an item in Second Life, like one of our crafting stations, it will send a query to our player server which then connects to the database and tells the database what to change (be it add items to a player inventory, or change the status of something like a field from watering to growing).

This also means, however, that as we grow, and more and more items are rezzed out on sim owners’ RPG worlds, more and more items are asking for connections to our database at the same time. And this is particularly noticeable during Second Life’s rolling restarts of everyone’s simulators.

When we made changes to every item to track quest objectives, we added to that pool of incoming connections, which saw our database spiking and hitting more than 60 connections per second.

When we hit 60 connections in that second, any connection requests above 60 get rejected and errored out. This means your items get confused and have to try re-establishing a connection to our servers, increasing response times.

How did we fix it?

Well, as a small indie team, a lot of learnings come with trial and error. And this takes a lot of time.

Drakkhis and I have skills in different areas, and when they come together at the right time, we come up with awesome solutions and awesome products as is seen with our product offerings, unlike anything else in Second Life.

The first thing you would think is, upgrade the database to enable more connections at once? This is the wrong approach, as we are not using nearly the amount of resources available to us on the instance. Our storage, CPU, and RAM are more than we need for SL Colonies and its website suite. Upgrading would be the wrong approach in this instance and would not fix an underlying issue. We didn’t know how many connections were erroring out. If we upgraded to any instance with a hard limit of 100 connections, would that solve the issue? Likely not - many sim owners have a few hundred items from SL Colonies rezzed out on land, and multiple simulators rebooting at the same time, we would find ourselves upgrading to a ridiculously expensive database instance and not making any use of its resources available to us, and still running into issues.

The next thing we tried, and Drakkhis spent a lot of thought on this and coding behind the scenes. Was to build into our scripting a way for our items to stagger their connections to the server during reboots. By coding within the scripts that on reset, it would choose a random number between x and x and use that number prior to connecting to the server. Drakkhis also did some work on the backend of the new SLC Warehouse, which allegedly was also causing us major connection issues.

This solution did not work. We found that we were still having server spikes of over 60 connections per second. But, in updating our scripts in this way, we also created more future-proofing, and this still staggers connections to our server-side scripts and the database. So there is a great benefit in the work done behind the scenes here and we do see some improvements.

I started thinking about the PHP scripts on our player website and how they connect to the database. I thought that surely there would be a way to script these so that we could have kind of a workers script and a _queue\ script when incoming connections spiked.

Looking into this further, the solution was to install a proxy between our server and the database. Basically going from SLC Item —> Meter Server —> Proxy —> Database.

How the proxy server works is that it takes all incoming connections from the Meter Server and does all the calculations. We’ve built in a limit of 45 connections that the Proxy Server connections and then pushes onto the Database. This means that instead of the database erroring out the scripts, when the proxy server detects more than 45 incoming connections, it has a built-in queue that holds all other incoming connections, waits for the database to have more open connections and pushes the rest through.

Benefits to the user

You would actually think that this creates more work and latency between our scripts, database, and back to Second Life. But in practice, and in reality, this has actually shown some significant speed improvements for our users.

Why is this? As briefly mentioned above, the scripts no longer get an error back from the database. When they error out, they don’t really know what to do. They may require the player to click an item a second time in order to get the item to respond. Or they may wait a period of time before attempting another connection to the database. This is actually slower than the proxy server adding a slow delay (which in some instances could just be one extra second if queued) and sending a response straight back to the script.

It is also extremely unlikely that you would be met with a significant delay, unless you are trying to use our items while rolling restarts are occurring.

Conclusion

I’d like to thank the SL Colonies community, and our Patrons for their patience during this time. This was something that we had to prioritise over everything else. And this kind of work is not the most fun for any of us. You as players don’t see much activity, anything really new happening. Drakkhis has to sit in code that is actually boring and frustrating, and I am left twiddling my thumbs instead of bringing out my creative and critical thinking skills as I don’t want to frustrate anyone or push too many ideas and changes that we cannot keep up with.

Adding more features during this time would have significantly impacted our system performance and made you all as players disappointed in our product as something that appears to be cheap with no care for server impact or investing time on making something work properly.

We hope that you have witnessed firsthand that we are a small team that prioritises our players first, making sure our product runs smoothly always and are not afraid to take a step back to work on other things to ensure continued fun in the long term.

3 Likes