NIMBY Rails devblog 2023-12
Accounting code rewrite
Work continued in the core game code to make it faster and more parallel. One of the remaining big non-parallel offenders was accounting. Accounting just takes a ridiculous amount of CPU compared to other game systems, and to make it worse, it required multiple non-parallel steps to properly aggregate all the data streams produced in the parallel sections. It was feeling intolerable to spend a week fighting to make the parallel section of the logic 5% faster while accounting by itself was eating up 25% of the all the CPU time, single threaded. So I threw away all of the accounting code and redesigned a new accounting system which was parallel from the start.
The new system benefits from the fact C++20 mandates atomic floating point types, including for some arithmetic operations. This means that with some careful setup the parallel code can now directly aggregate its accounting data on top of the “tip” (the current day) of the accounting database, without any synchronization.
There should be zero user visible changes due to this rewrite. Adding new accounting variables is now slightly harder for me, and it consumes slightly more total memory (but it’s much less fragmented, which is often a win), but depending on the balance between pax demand vs amount of game objects, it’s a noticeable sim speedup.
A lock-free multimap for track occupation
Another cursed data structure (but less than accounting) is the one holding the information of “this track has this piece of train(s) on top of it”, used for many things (collision, signal checks, rendering, etc). This data structure is basically temporary, recreated from scratch on every sim frame, because trying to keep it updated with minimal editing would be slower than just nuking it at any sim speed which makes trains change track segments often (so, depending on trains and building habits, 10x to 100x). Unlike accounting this data already allowed parallel writes, but it did so by using sharding and a collection of mutexes. With some care this kind of scheme allows you to parallelize almost anything, with nontrivial cost when there is some contention. And since the contention happens at the shard and not at at the track level, this was relatively frequent. The cost is hidden by the fact this is massively parallel, but if you can avoid it, you do.
So I did in 1.11. The occupation map of tracks is now based on custom developed lock-free multimap, inspired by this post on a simple lock-free hash table. Unlike the hash table in the post, I don’t need to modify atomic values, I just store all the values. This actually makes the implementation easier than in the post since I can store non-atomic data. Being an open addressed hash table it is also critical to find a proper size, since the performance will degrade a lot if it approaches the size limit. Plus, the fact it has a size limit at all. To sidestep both of these issues my multimap first checks if the hash table is near the capacity limit (with an atomic size counter), and if so, it takes a slow path based on a single mutex. On the next frame the newly allocated hash table has a size which could have accommodated the previous frame occupation map without hitting the slow mutex. This repeats for every frame, so the game is constantly and passively finding the “high water mark” as required by the player build, without any preprocessing. Indeed the allocated size is 0 on the first sim frame!
Partial rewrite of multiplayer
It was finally time to tackle the final task of this re-architecturing of the game: doing something about multiplayer. All of the changes in 1.11 so far had been carefully planned to not make this too hard or impossible. For example I discarded major sim optimizations which would make multiplayer private-sim-only forever.
I first wanted to make the command-transactional system from November the basis of the multiplayer in 1.11, like any sane game with a complex editor would do. These kind of systems almost scream MAKE ME NETWORK TRANSPARENT NOW, but I decided not to. The reason is that, although it would yield a rock solid multiplayer editor consistency experience, it would also eliminate the zero frames of latency players now enjoy in the editors. This is a quite unique feature of the game, akin to what other people accomplish with things like CRDT.
What it was possible to do is to remove a wart which has been in MP since 1.1: the client sim (in shared sim) now has a 100% guaranteed, untouched replica of the server database. This was only half true in the past. In order to avoid clients to keep two copies of the database, the game “layers” the locally modified data on top of its copy of the server. 1.11 still does this, but said layer is now properly isolating itself from the underlying copy. It also plays nicely with the new 1.11 asynchronous sim, only touching the local client layer for player edits.
The database part solved, it was now time to try to salvage the shared sim mode. In the past three years this mode has accumulated a lot of attempts to make it faster and more correct, and most of them were enabled to some degree still in 1.10. In 1.11 I deleted all of them, and replaced them with the most possible simple thing that would work if the game had infinite bandwidth available. On top of this I implemented a system to dynamically reduce the amount of synchronized data, tuned beforehand to the empirically discovered 128KB/s limit of Steam Relay. I don’t want to give numbers because multiplayer is extremely variable for players, but in my testing it feels a lot smoother and problem free compared to 1.10.
Performance options
After all the internal changes so far 1.11, I wanted to do more user-facing work, and try to tackle the main theme I originally scheduled for 1.11. But I was having a lot of doubts about it, so I instead opted into doing a couple QoLs and small features.
I know I am an old man, but at least in my day the expectation is that when you run a game you give it full control of your PC, to take as many resources as it needs to run. It’s not a browser or Excel, it’s not meant to multitask. And 1.11 will use more CPU than ever, since now it’s much more capable of exploiting multithreading than before.
But for players who want to put a limit to this, it will be possible to cap the number of sim threads and the UI framerate:
The first slider reduces the amount of sim threads, which can cut down dramatically the CPU usage of the game, while of course also cutting down dramatically the maximum achievable sim speed. The other two options limit the framerate of the UI. In 1.11 the UI rendering is independent of the game logic, so both of them can run at different speeds.
Rectangle selection only for connected tracks
Selecting one particular track run in the middle of a bunch of tracks, parallel or not, can be a pain. The rectangle selection tool offers little help in this case, unless the tracks are aligned to parallels and meridians. To solve this a new mode has been added to the rectangle selection tool: if used while pressing the Alt key, it will only select tracks simply connected to one of the selected tracks before the rectangle selection started. A video is easier to understand:
Board / disembark only line stops
An often requested feature is now in the game:
It works exactly like you imagine. Both the pax AI and the pathfinder are informed by this flag, so at the station level pax will refuse to board or disembark as requested, and the pathfinder won’t propose paths that don’t respect these stop settings. It is also possible to disable both behaviors to create a waypoint with wait time inside a station, but with zero pax interaction.
So what about 1.11?
I original intended for 1.11 to be a review of pax pathfinding, behavior and spawning, so I started by optimizing the game as much as possible in November and December to “make room” for all the extra required CPU for the features I wanted to implement. The main idea was pax archetypes, with spend limits for example, or time limits. The problem is that no level of optimization can accommodate this idea. Introducing pax archetypes means pax paths are not just cached based on the destination, they also must be cached on the archetype (moneybags pax and value-seeking pax need different pax pathfind caches, they could choose different paths for the same destination at the same time of the week).
The pax pathfind cache is what holding up the entire pax simulation, with a hit rate of 99.9%+. It is not a nice to have speedup, it is the central enabling design aspect of the entire pax sim. It is also very large. Fully expanded would be equivalent to the square of the number stations, so in the order of millions of entries for saves with +1000 stations. Fortunately the timetable saves the day, providing automatic expiration of paths based on departure times. But even with this optimal expiration system it is still very large.
Each archetype would require a whole independent copy of the cache, which is untenable. Even if the RAM cost was paid, it would also slow down the simulation by linear proportion to the variety of archetypes. I do not want to degrade my game this way, so I decided to stop development of this idea. This does not mean custom cash or time limits for pax pathfinding won’t happen, but at the moment it looks like it will be more of a global game rule than something at the pax AI level.
With archetypes abandoned I’m not sure at the moment where to take the remaining of 1.11. It is possible I will release it as-is, since there’s a huge amount of internal changes which could use some real testing, even if the user facing changes are thin.