Fixing Lag: And I, for one, welcome our new automaton overlords | EVE Online

Fixing Lag: And I, for one, welcome our new automaton overlords

2010-08-18 - By CCP Atropos

Over the last few months I've been working on something very cool that some of you may have heard about; it's a project to rework the guts of the EVE game client to remove the audio and visual aspects of the game. In other words, to slim the client down as much as possible so that it can be considered 'lite' or 'thin'.

And thus the Thin Client(TM) was born!

Click to see the thin client in all its glory...

What can this thin client do?

The basis for the thin client is the very EVE client you use yourselves; it takes that core and extends and overrides parts of it, so that you no longer need to have a sound card (insert generic EVE has sound meme) or a graphics card to run it.

Why should I care?

The thin client requires less system resources than a traditional 'full fat' client. As a result we can run more of them in parallel on one computer. Whereas a (normal?) EVE player might run 2 or maybe 3 accounts simultaneously with a traditional client, it's possible to run many times this number with the thin clients.

The obvious benefit of a client like this is one of scale; we can start up many hundreds of these clients and have them do something, anything.

It now becomes possible to set them up so that we can undertake a controlled, large scale test; you can submit a new change to the code base and retest with the same setup to examine the effects of the code change. The level of control and precision these tests now give us is unprecedented.

Such practices have been used to load test websites for a long time, by repeatedly making requests to websites in an effort to discover the bottlenecks of the system. However, for EVE the closest we have gotten is the mass tests on Singularity that CCP Tanis runs.

The mass tests provide us with valuable data, but they can be very hard to exercise control over, since you are dealing with anywhere from 200 to 500 living breathing EVE players. The thin clients on the other hand are mindless automatons; if we say jump off a cliff, they will, metaphorically, go straight for the edge.

Ok, but what does this actually mean for me?

The thin clients themselves aren't any smarter than a normal client. If you start up a normal EVE client it doesn't suddenly start trying to take over the world ala SkyNet (hopefully), and the same is true of the thin client. To bridge this gap we've created a variety of methods through which we can tell the clients what to do: the two methods are internal projects called Orchestrator and the Automaton Project, both of which I'll touch on later.

By being able to tell a client what to do we have created for ourselves a massive new tool box, which can be used to great effect. Allow me to elaborate:

  1. It becomes possible to examine the behavior of massive amounts of mission running in the same system. To examine, as it were, the Rens Effect.
  2. We can systematically examine the behavior of clients when they're fighting large scale fleet fights, allowing us to recreate and diagnose the unique problems that large scale fleet fights create, in the lab.
  3. We can place Jita under the microscope so that the impact of many thousands of market transactions can be understood in detail.
  4. We can examine just why when one fleet jumps into another, the black screen of impending death appears along with more intricate reproduction steps beyond "get a big fleet and jump into another one".
  5. We can determine not only the theoretical threshold but also the actual performance threshold for fully loaded systems, whether it's pilots idling in space, hunting NPC's, shopping, afk-ing, anything.
  6. And finally, we can evaluate the impact, at large scales, of new gameplay mechanics. Older players will recall many gameplay changes over the years, attributed to enhancing server performance such as the limiting of a ship to 5 drones from 10, changes to the rate of fire and damage modifiers on weapons to limit the impact high rate of fire weaponry would have on the server, for two obvious examples.

This new tool box allows us to load and stress test some of the oldest and most intricate components of the game.

Enough of the hurf blurf, give me the juicy stuff...

This is where I get technical, so if you fear techy, geeky nerd talk, skip this.

The obvious question is how did we achieve this? The core of the solution was through the application of two simple things:

  • Mocking and mock objects
  • Python class inheritance

For those of you with no programming knowledge I'll clarify: mocking is the practice of replacing one object with another that is almost identical but allows a lot more control. It's a process that is used in unit testing and allows the developer to test a piece of code in isolation. The use of mocking allows us to replace the pieces of the traditional codebase that rely upon a GUI with mock objects that do nothing. If you want to know more Wikipedia has a nice page on the subject.

The second step was the use of standard inheritance within Python to allow us to override particular pieces of the code; to explain I'll run through a simple example:

Consider the targeting system: when you initiate a target lock on a ship, asteroid, or whatever, you're telling the server to lock a target and to let you know when that is successful, or not, if they're out of your range.

This is represented on your client as a new target appearing at the top of your screen. With the thin clients, we don't have a GUI and so when the client gets the message from the server and attempts to load up the icon, it can go a bit haywire and raise errors about UI components missing and such.

By inheriting the class that handles the targeting, we can replace the single function causing the error with something that gracefully handles this new set of circumstances.

Of course, this can be very beneficial for us. It allows us to highlight areas of the codebase that are ripe for refactoring, where the game logic and the UI are too closely tied together.

In a lot of these cases, we're reviewing and touching on older code and so we are getting ancillary benefits from reviewing these files from a more up-to-date viewpoint.

So what's the performance like?

The average thin client has a memory footprint of between 150 to 200 MB. Now this may not be listed under the definition of 'thin' in everyone's dictionary, but it's a very good start. As we progress there will be more ways that we can reduce this footprint even more. As for CPU, the client requires very little; almost all the CPU required is in the first 30 seconds as all the Python libraries and code are loaded into memory. Once that's complete the clients become relatively quiescent. Unfortunately when you run a few hundred of these at the same time, even minor CPU fluctuations, occurring across every client at the same time, can cause problems, so it's something we're keen to keep to a minimum.

But what about control? How do you tell them what to do?

Orchestrator is the framework that we've developed for running our system tests. Its primary function is to setup a server and client and to run a particular test on the client with traditional pass/fail mechanisms.

The only problem with this scenario is that Orchestrator is a very possessive system; it wants to have full control of everything, proxies, server and connecting clients, and for what we're doing it proves a little too greedy. Because of the architecture of Orchestrator it's not the ideal candidate for large scale control of clients, but it does allow us to run targeted tests making use of fewer slaved clients.

As for the Automaton Project, well, I have to point out, no one but me calls it that, it's just my pet name for it. The project is a way of bootstrapping the client and having it execute arbitrary code locally, rather than having its movements dictated by a controller elsewhere on the network.

The difference between the two methodologies we have for controlling the clients is that one is a master/slave paradigm, whereas the other is group of fully autonomous actors; each has its pros and cons and we don't want to blindly follow one particular path only to find that it's actually the cause of our problems rather than the salvation.

So when can I get my hands on this?

Never, sorry :) The client is a developer tool only and whilst many people may want a less resource intensive client this isn't the one you're looking for .

What now?

Now that we've got these tools, there's work to be done creating tests for them. CCP Veritas has been toying around with them recently and has uncovered some interesting pointers whilst hunting the infamous lag monster, but I'm sure he'll detail that in his own blog.

As for myself, there's lots more API's that need coding to allow the client to do more. Our primary goal is solving the lag issue, but beyond that I want to create a market interface that will allow us to setup mass trading so we can emulate Jita. There's also turning a herd (flock? what do you call a group of these things? an army?) of asynchronous automatons into an organized fleet, then there's work to be done on slimming them down further,  etc., etc., ad infinitum.

There is one thing I want to stress though: we still need your help. Once we've used these clients and other tools to track down problems and submit fixes, we're going to need each and every one of you to lend us your time and effort to checking them on Singularity. EVE players have massive amounts of ingenuity, and we need to use that and resilience to help us stress test these fixes.

And on that note, au revoir!