NIMBY Rails devblog 2024-10

Completely redesigned multiplayer internals

Even thought multiplayer has been a feature since 1.1, released almost four years ago, it has never fully worked properly. Part of the initial problem was wanting to make the simulation properly networked. This proved to be impossible because the simulation code was never written to be deterministic (for example, it is very tricky to make train motion deterministic when the track geometry is vector based, with 64 bit float resolution, and procedurally generated; in particular things like the order in which tracks are generated can influence other tracks, due the parent and branch relationships).

But even if the sim is a lost cause, the database part of the game should be not. The database is everything that is player editable but independent of the simulation. For example, the track nodes (not the actual track geometry). Or the orders in a schedule. Or the fact a train is composed of certain cars in a certain order. Basically everything that as a UI editor associated to it, and “does not move by itself, it only moves by manual player action” so to speak (trains are separated into the concept of train and the concept of “motion” to make this possible.)

Multiplayer therefore mainly consists in trying to keep this database of user created objects synchronized between all the players. There’s a variety of strategies to accomplish this. Some are impossible to do in NIMBY Rails, since it’s a goal of the game to only be limited by user CPU and RAM, so the database size can go into the gigabytes for large saves. Therefore direct synchronization of this database is only possible at game start, as a save file, and not practical during gameplay.

Another handicap for the NIMBY Rails netcode is that there is no limit to player movement or visibility range. Other games supporting massive worlds with massive amounts of player created objects, like Minecraft, have the advantage of players being limited in movement and visibility. This means the server only needs to send a limited set of information to the client, a subset of the full database (or chunk map in this case). And the client assumes it only has a access to that subset, like a cache.

But in my game a player is always a single click away from moving from a dense city center with 30,000+ objects to a different city center with another set of 30,000+ objects, and the player expectation is that it should load instantly. Or zooming out to see the entire world in a second. Even more taxing is the concept of the various listings in the editors and assets panel, which should list all objects in their current state immediately, without any “loading” delay from the server.

With these requirements there’s little alternative but to keep a full copy of the database in every client, not just in the server, and not just the subset that is visible or usable to the player in a given moment. And therefore, after the initial game load, multiplayer consists of finding a way of coordinating all the individual player edits to the database, and replicating them among all the players.

At a high level, I identified two approaches to do this: synchronize the data, or synchronize the edits themselves.

Synchronize the data: database changeset replication (v1.1 to v1.14)

This is what the multiplayer code has done from v1.1 to v1.14. With several important improvements over the years, but the basic concept has remained the same: the database was programmed to be capable of “recording” the changes done to it by player edits, in the form of a “changeset”, which stores copies of every created, edited or deleted object, before and after the edit. A very important fact of this design is that it is independent of the player edits! Anything can be done to the database, and it will dutifully record the before and after states, and produce a list of the changes. Implementing this system and not having it be super slow was very complex, but it only had to be done once, and from that point it was independent of every new feature implemented in the game.

In this design, a client edits a few tracks in an area of the map (it could also be some order in a schedule, or some stop in a line). This edit produces a changeset in the database of the client. To give the player immediate feedback, the database is kept in a “layered” state: the UI of the client sees the database as if the edit was final, but it is actually seeing it “filtered” thru the changeset, with the real (unmodified) database underneath. The client can even keep a stack of these “I believe I edited this object, but it’s not actually real” changesets.

The server receives all of these changesets, and “merges” them to its own database. So when you create a station, the server does not create a station like the client did. Instead it receives a changeset, essentially a clip like in the clipboard feature, which includes some track and a station objects. The “merge” operation can succeed or not (for example, some other client might have altered the map in a way that makes it impossible to merge). After it collects some of these changesets, it merges them into a single larger one, and broadcast it to all the clients.

The clients receive these changesets and try to merge them, in the same order as the server did, while also deleting layers of their stack of local edit feedback as they correspond to the server changesets (this is tracked by the server). Remember how some edits were rejected by the server? The local client edits will be deleted even if the server rejected them, to keep the database identical between all the clients.

Sounds complicated? It is, very much. For the geeks reading, it is basically an eventually consistent distributed database, with row level replication, and MVCC on top for local editing feedback. It also has a fatal flaw: not all game state is in the database.

There’s some game state which should be very performant to query and update, which is not in the goal of this ad-hoc database. For example the spatial index of all the tracks in the map is not in the database. It is a specialized structure which does not fit well in the database design, and it would be much slower if it was shoehorned into it.

Therefore the “merge” I was explaining earlier is doing a lot of extra work beyond “put this data in the database”. It has to look into the contents of the changesets, compare them to the current database, and deduce which changes are required to these not-in-the-database structures. It is like a second editing system just for merging changesets in the database and the all the other structures which are not in the changeset, by recreating or preserving their contents.

Synchronize the edits: edit command replication (v1.15)

For 1.15 I deleted everything I described in the previous section. Even the database itself has been simplified. Gone are all the concepts of changesets, layers, merging, etc. The database is now just a map of ID to objects, with a table per object type, and nothing else. It cannot store multiple versions, it cannot merge changesets. This simplification has resulted in a slight improvement in performance, noticeable even in single player. I’m planning deeper changes in this area, like switching to a pooled object system with object IDs as direct indexes into memory for an additional speedup, which allows to remove the hashtable map, but it will have to wait a bit.

But the most radical change has been done to the design of multiplayer synchronization. In this new v1.15 design the synchronization is done by replicating the player edit actions themselves, not the results of these actions. And in fact this work started in v1.11. In that version the UI and the core of the game were made fully independent of each other, allowing the core to run asynchronously in its own thread(s). This was accomplished by making every player edit action correspond to a “command”. These commands are like a message, for example “move track id 1234 by (-3, 10) units”, or “create a new schedule in list 1234 with the these contents”. In other words, every player action is translated into data. And crucially this data does not correspond to anything in the database, it really is just an annotation of what the player did in the UI.

In v1.11 to v1.14 multiplayer these commands were internal to the communication between the UI and core. The clients executed them locally, recorded the changeset, and sent it to the server, like it was always done. But in v1.15 nothing is executed in the client when the player invokes an action the UI. Instead the command itself is transformed into a network message and sent to the server. The server receives these messages and executes them. This keeps its local copy of the database updated. But then, instead of sending a database changeset to the clients, it re-sends (replicates) the messages themselves, broadcasting them to all the clients. The clients receive these messages and run them in the same order as the server.

The crucial difference in this system compared to changeset replication is that the “merge” step does not exist. There is no parallel logic to merge changesets, independent of the actual game logic. The server and all the clients execute the exact same command over the exact same database in the same order, using the exact same code. This is a huge correctness and reliability improvement compared to the old system, and it also means I only have to maintain one set of game logic instead of two. The extra work of encapsulating every player action in a command message is tiny in comparison.

So where’s the catch? There’s 3 catches: latency, undo, and stations.

Latency: the super elaborate “layer” system from the previous version allowed immediate feedback of client edits, without having to wait for the server to merge the changes. At the moment this is not the case in 1.15. The client needs to wait for the server to display anything. Most of the time this is barely noticeable, but for any mouse-attached action, like dragging tracks, it is noticeable. I will have to come up with a solution for this, but the core idea of command replication won’t be changed.

Undo: undo was the other user of the database changeset/layer system. It also benefited from the fact that system was generic, so there was no need to create independent handling of user actions like “the undo of moving a track”, “the undo of deleting a station”, etc. My current plan is to resurrect the creation of changesets (I kept this in the new database), and implement a very restricted version of the merge, just for the undo.

Stations: Looking into undo reminded me of its number one problem, which also extends to multiplayer: they way stations are auto created and deleted from touching tracks. This is, by far, the largest source of undo problems, and of many multiplayer desynchronization errors. While the new multiplayer system is not incompatible with the legacy design of stations, they will keep being a source of errors in undo. Realizing that after all of these deep changes and major evolutions of the game, I was still having fundamental problems in a feature that should have been rock solid for years, I decided to make a pause in the implementation of the new MP and database system and finally take action on stations.

Stations v3: persistent stations with explicit user editing

Why are stations edited the way they are? The reason I picked such a design is because I wanted station creation to be as simple as possible. In particular, in grid based games like OpenTTD it is completely trivial to expand an station with a new platform by just creating a new one next to it. I wanted that simplicity in my game too. The error was making this kind of rule a core game rule, rather than an editor rule. But back in the day (this was nearly 5 years ago) that distinction was not as clear as it is now, and the game was radically simpler too.

The problems of this station design have been accumulating for years. The main one is the fact stations do not persist, they can be unpredictably replaced with a fresh new object just by moving some tracks. They only exist as a side effect of some tracks being arranged in a particular way. In recognition of this fact, I’ve kept the number of station-related features as small as possible, unfortunately limiting the game features for stations. Additionally, this design does not allow to separate stations by depth, and it requires the existence of special buildings or useless extra platforms to create more detached designs.

In v1.15 all of these flaws will be addressed with a new station design:

  • Stations can now exist independent of platforms. In fact, stations with 0 platforms are allowed to exist. And deleting tracks will never delete stations. Deleting a station will now require a explicit player action, respecting the player work.
  • Tracks can be manually assigned to stations to convert them into a platform. This is streamlined with a dropdown to directly pick nearby stations, so there’s no need to go hunting for icons or nodes on the the map. In a similar fashion, you can also remove the association to a station (the old demote) or create a new station from a track (the old promote). There will be a bulk editor equivalent of these tools, not yet implemented.
  • Platform extension buildings have been removed. There’s no need anymore for any track to touch any other track, you can just directly edit which tracks belong to a station, and this link will persist until is deleted.

  • Walk link buildings have been removed. The station editor will allow to create walk links by just selecting nearby stations in a list.

  • Since stations are now persistent game objects, they get their own top level editor, to allow for future expansion and new features, starting with tags in v1.15. The most basic station editing options, like naming, will also remain accessible in the track editor.

But! What about the ease of appending new platforms to stations? I see two sides of this coin. Advanced players do not use the station tool anymore. The previously described manual and bulk assignation tools are for them. This is not new, it has been going on since the promote/demote buttons were added to the game. An advanced player in v1.15 can just promote a track, then (bulk) append platforms to it, using the tools described earlier, for a very similar workflow.

Therefore I am comfortable in relegating the station creation tool as a beginner tool, trying to keep it as similar as possible, including platform footprints. In 1.15 the only time you’ll see platform footprints is when you use this tool. These footprints now have zero gameplay relevance, and only exist as a guide to select a station for appending a platform to it. Click inside a footprint and it extends the station with a new platform, click outside and it creates a new station. The end result is that using this tool is very similar to the previous design, but the only time the footprints are relevant is during that first step in this specific tool, and nowhere else.

v1.15 is still some ways off. The new station system is a very new development and still needs a lot of work. The multiplayer and database changes are more mature in comparison, but the final bits (specially restoring undo and latency mitigation) are hard to do.