I noticed recently that Exon now runs like shit on my big game-dev laptop.
This is a problem for two reasons: one, the game looks like it’s from the late 90s so I feel like it should be able to run on at least a 5-year-old laptop; and two, half the reason I even bought the laptop was so that I could use it to demo the game out in the wild. If it can’t demo the game, it’s basically useless. (Okay, that’s a bit harsh; it’s still useful as a dev machine so I can work in Levels for a bit each week. Please come visit me on Friday and Saturday afternoons!)
It is, thus, time to descend into the Profiler to find out what the hell is going on…
Overlook, my Acer Predator laptop with its blazing red LED strips and aggressive spaceship-style chassis, has a 6th gen Intel i5 quad core that runs at 2.3GHz. I don’t think this is an unreasonable lower bound for a fundamentally basic action game that looks like potatoes (the GTX 970M naturally couldn’t give two shits and could render the game in its sleep). It’s a higher bound than my original dreams from the days of WC3 modding (I dreamt of making a game for literal potato computers), but then reality got in the way so I’m happy to negotiate.
For comparison, my main PC has a quad core i7 from the same generation (that’s the 2nd CPU it’s ever had) running at 4GHz and so is naturally about twice as forgiving of any power sinks. (The number of cores is basically irrelevant; apart from the A* Pathfinding Project’s worker threads and anything Unity might offload in the background, it’s all down to single-core power.)
So what on earth am I doing that’s burning so much processor power? The profiler immediately paints a very obvious picture: I am raycasting far too much. Hundreds of ’em.
This is almost entirely down to my unit movement system. Each unit does three raycasts each fixed frame to check whether it’s standing on the ground or not, and whether that ground is a slope and suchlike. Perfectly reasonable; units need to know if they’re falling or standing and whether they should be moving up-slope or down-slope and that’s how one does it. Everyone says “don’t raycast” in their performance tips articles but never says what you should do instead. You need to know about the physical reality an object finds itself in? Gotta do a raycast, mate. No two ways about it.
Because it’s a per-unit thing, though, the problem of performance has kinda crept up on me — the Exon Academy level started with a small number of units and I’ve added more piecemeal along the way. In the original level that was just the Arena with only 8 combatants fighting it out things were just dandy — but now there’s a surrounding wilderness with civilians in the train station and wild transgenics in the forest there are 60-odd units standing around. According to a brief and very unscientific test, the laptop’s i5 can only pull Exon over 60fps when there are 20 units or fewer. That’s not good enough! I want loadsa units!
Which means that I need to make fewer raycasts. But how? Those units need to know whether or not they’re on the ground — this defines whether or not they can move, jump, attack, do the things that units do. Their groundedness is of critical importance to the gameplay so it is worth spending cycles on this… just not so many the game chugs.
The thing is, though, most units in the world spend most of their time standing around. Thus, unless the earth moves underneath them, their groundedness checks will return the same results every time: these results can be cached. That should cut out most of the pain for most of the time, as while I definitely want to have more than 20 units in any given level, being limited to only 20 units moving simultaneously is much easier to swallow.
I was going to further than this, but this change alone took me from 30fps to 50fps on the laptop and into the 100s on my main PC. The movement system ceased to be the primary bottleneck and it was replaced with… unit faction checking?
Yes, apparently it takes 2ms for units to assess whether every unit in sight is an ally or an enemy. This was a huge warning flag because there’s nothing in there that “should” be expensive — it’s just a bit of dictionary and hashset indexing. There’s a list of units in range and it walks over them checking for them being hostile or not, and recategorises them for the convenience of other systems as appropriate. Why is this so expensive?
Turns out, when testing whether two Unity objects are the same object, the engine has to cross the native-managed boundary — that expensive crossing that links the wishy-washy stuff in C# with the underlying C++ foundations. Every HashSet “contains” operation was tangentially checking whether the unmanaged side had been destroyed or not, something that is known to be painful.
I quickly realised that this is not necessary for me. After all, I built an entire save/load system that assigns unique IDs to every live object and I can use this for comparison rather than relying on traditional reference equality that spirals into Unity quirks. It’s a simple enough matter to override Equals and GetHashCode in my two base classes (RDZComponent for MonoBehaviours and RDZDataObject for ScriptableObjects) so that equality checks become mere integer comparisons. That shaved off quite a few more milliseconds in the senses system, and luckily that’s a fundemental enough change that it should shave off a few milliseconds everywhere. It’s also fundamental enough that it might destroy the universe by interrupting deeper C# systems, but you can’t make an omelette without breaking a few eggs, right? That’s the thing with optimisation — all the articles in the world can’t know how your game works and therefore what trade-offs you can make that wouldn’t be safe elsewhere.
To be fair, I also had some more obvious silly things to chop out. The chunks that make up each unit, for example, had their lightning overlay and temperature glow values updated every frame even if those values had not changed — which meant a small but significant punting of data into the shader and thus across to the graphics card every frame, another expensive operation. This has been almost completely annihilated with a cheap “has this value actually changed?” check.
But after that? There’s no longer a really obvious bottleneck; it’s down to tenths and hundredths of milliseconds smeared across all of the interlocking systems. Which makes every tiny change feel pointless even though, in aggregate, those tenths and hundredths of milliseconds all add up. Do a Vector3 Dot product instead of using traditional degree angles. Store a separate boolean that says “this unit has a turret” instead of using the implicit check on the actual GameObject. Cache those hash codes instead of recalculating them every time.
All of which has taken me to about 55fps at rest on the laptop; still not good quite enough, though with this being in the Unity editor (which adds a good 5ms overhead), that might just do it for when the game is running standalone. Maybe doing an IL2CPP build will take it even further and I can just stop worrying.
However, I still haven’t gone anywhere near the heavier systems that kick in during combat — this is so far just stepping off the train at the intro cinematic and letting the world idle. With my plan to ship this demo in December I really need to make the best first impression I can, so I guess I’ve still got a lot more shaving to do…