Multithreading for train AI
Pax and station processing became so optimized in v1.4 that for a relatively low ratio of trains vs pax, train AI became the major factor of CPU usage. Train AI was still single threaded so it was time to update it to the same standard as pax and stations and make it multithreaded.
Unlike pax and train processing at stations, which are fully independent from each other, it is impossible to make train AI 100% multithreaded, since trains interact with each other in unpredictable ways (collision, signals, reservations) and over a shared medium (tracks). Fortunately I’ve been aware of these limitations since the very start and tried to make sure train AI didn’t become too coupled to itself, even if it was still single threaded.
The key to transition it to multithreaded was the fact train AI is, most of the time, independent of other trains. It can be identified when a train might be dependent on a shared state with other train, and then remove said train from the pool of trains to be processed in parallel. This is the case for trains waiting at signals: after all other trains have been processed in parallel, signal waiters are processed in single threaded fashion.
Trains passing by a signal, and thus liable to stop and start waiting, are not initially single threaded, since without running the actual train AI it’s impossible to know if the train is going to pass a signal. But when it happens the train stops its AI processing immediately and queues itself up to be processed later in the single threaded phase. This is possible because train AI processing was made incremental and resumable months ago, in preparation for this effort.
Finally, for collisions, there was no good solution, so I just decided to make them “stale” for a single frame. This means trains will collide with the positions of other trains from the previous frame, rather than the (deterministically ordered) same-frame system used until now. This allows collision check to become multithreaded at the expense of being less precise, which I think is a good tradeoff.
The combination of multithreaded train AI, combined with the other MT optimizations from May (pax and stations MT plus partially asynchronous simulation) made the internal v1.5 build the fastest the game has ever been. Sadly nobody will ever play this version of the game, since all this performance budget will be consumed by timetables, and then some (for orders of magnitude some).
The design and implementation of timetables in NIMBY Rails v1.5 can be divided in 3 big blocks:
Data design and its rules. This is the actual timetable data, and how it relates to the existing game objects like trains, lines and stations. It involves decisions like one timetable per line plus shift times per train, or more elaborate ideas like independent timetables per train-line; and then the game data structures to represent those. I include the UI/UX effort in this item, since it’s very coupled to the data itself.
The effort for the UI/UX part will be big in order to make the feature usable, and it also needs to include a fair amount of automation, including full automation. I expect even experienced players to use some degree of automatic timetables to avoid tedium when setting up lines. But as big as this effort is, it cannot “break the game”, in the sense of making it run poorly (it might make using it feel poorly, but this is a different, and more actionable, kind of problem.)
Train AI for timetables. How are timetables going to alter the existing train AI, making the train adapt itself to the fixed in the day arrival and departure times required by timetables. Also how the train is supposed to act when out of schedule, both inside stations and while on the tracks.
This is the easiest task. Train AI is already quite flexible and has some concept of absolute timing in the (underused, buggy) train scheduler. This task is also not required to develop other parts of the feature, so I’m leaving it for last.
Pax AI for timetables. Making the pax pathfinder capable of finding the best path to their destination in the timetabled line network. And make it fast enough the game can keep a level of simulation speed comparable to v1.4.
The third point is the most critical hurdle for implementing timetables, so I decided to start the development effort there. For real world timetables this is an open research area and you can find state of the art papers which are less than 10 years old. Depending on the picked data design the game timetables won’t be much different than their real world counterparts.
Out of necessity the v1.5 pax pathfinder currently imposes these restrictions to try to keep the game performance from imploding:
- Pax won’t consider a given line + stop combination more than once when deciding what train to take from a station (so, only the earliest train for a given line + stop is ever considered). This is the most important optimization.
- Trains are picked by considering the stops they make. Pax will lookup up to the end of the current train line run, and then continue into the next run up until the boarding stop. This means pax will consider all line stops but only once, never considering staying in the train for longer than the equivalent of one full line run.
- Pax won’t consider waiting longer than 3h in a station
These restrictions only apply while calculating a path. In practice since pax won’t have memory in v1.5 (just like they never did in the past), it is possible some of these will happen anyway. This is not necessarily a good thing. A disconnect between the calculated path and the real path can result in bad behaviors like stuck paxs or ping-pong bewteen trains ands stations. As with earlier versions of the pathfinder it will take actual testing to see if this design is impacted by these bugs.
After arriving to these conclusions about limitations in the algorithm, I still didn’t have a data design, so I prepared one just for development purposes. It stores lines as a sequence of stops, exactly like v1.4, and then trains fill their entire week schedule with “runs”, which is running a single loop of the line, starting a certain time of the week and for a certain duration, and with space for declaring some stops in the run to be exceptions and have their own fixed timing. Combining this run data with the line data for stop timing it is possible to calculate the fixed arrival and departure time for every train in every station. Keep in mind this design is purely for developing the pax pathfinder. It will change once the other areas (train AI and game UI) are also implemented for v1.5
So now with some data to test, it was time to implement the pax pathfinder. Again I implemented an A* search with a crappy heuristic, which is more or less unavoidable. The state of the art for this kind of graph search requires massive amounts of preprocessing time which is just not possible to do in an interactive game where the user can change any aspect of the graph itself at any time. There is some preprocessing involved but it must be kept to at most a few seconds (this will be mostly done on game loading and then with asynchronous processing in-game).
Unlike the previous pathfinder, I decided to not materialize the graph edges, so edges are fully dynamic this time. This is slower but it is required since edges represent a possible link between stations, and this would explode the required data to preprocess and store. Part of the reason is that this pathfinder skips intermediate stops, tracing edges either to transfer stations or to the goal station.
And after ironing some bugs it worked out pretty well:
So here it is, v1.5 is running pax AI on (hypothetical, ignored by trains, uneditable) fixed timetables. It also runs the pax AI at around 100 times slower than v1.4. This is where all the gained performance in v1.4 has gone, and then some. There’s still some basic optimizations to be done in the pathfinder itself, but it will never be as fast as v1.4. And there is a very major impact: the pax pathfind cache does not make any sense starting in v1.5. Pax paths can become stale at any second of the day, and become only valid again for a few seconds after 7 days, so it does not make any sense to cache them. Not being able to cache pax paths will impact the game performance forever. This impact is not just about being able to sim faster or slower, it is actually lowering the performance ceiling so much that certain medium builds will become impossible to play.
As I explained earlier, before I tackle timetabled trains and timetable UI, I need to make sure timetable pax are viable. The first hurdle has been passed: they are viable in the correctness sense. The game can pax pathfind using the week time and produce an optimal-looking travel plan based on timetables.
But they also need to be viable in the performance sense. I will start by implementing data and algorithmic optimizations to the pathfinder itself, and afterwards I have some game system-level mitigation ideas to test, the main one being a change to how stations process pax, by forcing all new and transfer pax into a “hall queue” which removes pax processing from the main sim processing and runs their pathfiding asynchronously (and multithreaded of course, but this is already the case). Unfortunately this idea is not compatible with multiplayer, so it requires more investigation.