NIMBY Rails devblog 2022-04
Platform stop signals are back
It was pointed out there were aesthetic reasons to manually set a stopping point for trains in platforms, specially now that the visuals of a platforms can be completely player defined. So platform stop have been enabled again. The existing caveats in 1.4 still apply, so if you set the platform stop points in a way that makes the train not fit the platform, and/or too close to the middle point between the platform track segments, the train AI might ignore them and/or make the train fold over itself.
Dead ends in multithreading the code
April was dominated by two big tasks: moving and preparing the code for more multithreading (and thus better CPU utilization) usage. Moving went well but it was a bit drawn out (I still have no fiber internet.) But the multithreading effort started on a bad step.
The game already had some multithread support from the v1.1 version, in areas like map loading. The game logic and AI made a much more limited usage, with the exception of the pax AI pathfinder, which has a multithreaded cache prefetcher (basically when a train is approaching a station one or more asynchronous tasks are started to start pathfinding the train pax and some of the station pax, and to store the results in the pax pathfind cache, ready for the moment the train arrives). These multithread efforts used the basic primitives offered by C++11 and later, like std::async. But they are missing control over the parallelism, or they are implemented in ways which are not appropriate for interactive applications, like spawning whole threads for single tasks.
I really only needed an std::async-like system with a proper thread pool and fast and low overhead tasks, but instead decided to try more advanced libraries like Taskflow. Taskflow and some of its peers belong to a category of libraries based on building a graph of your computation, and then letting the library figure the execution order of the tasks. Even if the foundation of these libraries is exactly what I needed (thread pool plus quick to start tasks), they carry a big API on top of it, of which I had little use. After trying some of them I finally settled on Async++ which keeps its API a notch down, with just some primitives for composition and chaining of tasks, rather than a declarative graph framework. I ported the existing std::async and bare std::thread usage to async++ and it worked perfectly. 1.4.16 is already using the new framework and so far there’s been zero stability problems (some map corruption has cropped up, but it is caused by overly aggressive GPU uploading on my part, which will be disabled in 1.4.17).
Multithreaded GPU upload for map data: oops
One of the major existing users of multithreading is the map system. Map tiles have always been loaded from disk with a thread pool and asynchronously, but they were still being uploaded to the GPU in a serial way, which included some data copying. I decided to test multithreaded GPU upload, which is available with bgfx. It turned out this creates very occasional corruption. I suspect I am not properly waiting for the required number of frames for the transfer to go thru. In any case the performance improvement was minimal so I am disabling this feature in 1.4.17.
Multithreaded station validation
Another new area to test the new framework was during game loading. Game loading has zero player interaction so it’s a good candidate to test new threading code. One of major slow downs on game load is station validation, so I decided to make it multithreaded. “Station validation” is actually the calculation of the station footprint polygon and then the population sum covered by said polygon. The underlying code was almost ready, having been cleaned up in the past of extraneous writes outside one station, making it very parallel. In this case the experiment worked properly and it gives a nice speedup.
Rework of station pax flow system
The next one to try was much more interesting and also much harder: station processing. But before I tackled that I needed to review station pax flow. Pax flow has it existed was not really a flow of pax, being more like some not well defined “work allowance” system for stations and trains to receive CPU time during their AI processing. It filled up little by little on each frame and when it had enough of it, it was consumed to process some not well defined amount of pax.
1.4.17 changes this system so the “work allowance” is now actually a “pax allowance” and it tries as hard as possible to represent an amount of pax processed. The code now uses this flow value as a pax limit to process. This gives station and train flow more predictable behavior and makes it easier to tune. As an indirect benefit the “small counters” which appear next to the input/output pax counters on waiting trains are now much more reliable and can properly count down to 0.
Multithreaded station and train pax processing
This is the first major piece of the game AI “on the main path” (as opposed to opportunistic secondary tasks like precalculating pax paths) which becomes multithreaded. It was not possible to make it fully so, with some aspects of it still running on a single thread after the MT processing is done. In particular accounting is very hard to make multithreaded without exploding the memory usage of the game (which in turn would tank down the CPU usage later on when it’s time to unify the accounts). Indeed at some point accounting was more than 50% of the CPU time required to run the station and train pax AI. But in the end I was able to find some patterns to bring it down.
Starting in 1.4.17 station and train pax processing is multithreaded and faster. But it wasn’t that slow to begin with. A huge effort was done at the end of 1.2 and start of 1.3 to optimize pax pathfind and this made it already quite fast. While at the same time train AI never stopped getting more complex. So the situation is now that given 30ms of game logic and AI processing, 28ms are devoted to train AI and 2ms are devoted to station and train pax AI (in a save with 4500 trains, 2500 stations and 400K+ ridership at max speed). Clearly train AI is the next objective of this effort.
That being said making stations multithreaded was a very good exercise in preparing the code for parallelism. Trains are more challenging due to the implicit sharing they do of the tracks. Unlike stations which can be processed completely independent of each other (with the current game rules).