GB TAS Verification as Test Automation for Gambatte Speedrun
Using console verified TASes as a test framework for Game Boy emulation accuracy
Traditionally emulators have been tested via suites of test ROMs, a kind of micro-benchmark for specific console behaviors, in combination with regression tests for basic booting of games. Implementations of this can be seen in daid's GB Emulator Shootout and SameBoy's automation, which provided an inspiration for this post. This approach generally succeeds at tracking the accuracy of emulators, but it has pitfalls in the context of TASing. Tool-Assisted Speedruns generally require decently accurate emulators to maintain proper comparison between newer, faster TASes and the ones they obselete.
Even further, the TASBot team and the rest of those involved in console verification require emulators that can replicate the exact state of the console at least at the time of each polled input. To learn more about console verification, you can visit the TASBot archive at runs.tas.bot. The TASBot team frequently runs into issues where TASes that were previously known to sync between emulator and console stop syncing on newer revisions of the emulator. Many systems also aren't researched enough to have a suite of testROMs by which to evaluate their accuracy.
This site aims to document a supplemental approach to testROMs for the emulator development scene, using known verified TASes as a test suite for emulation accuracy. Game Boy Console Verification is mainly performed using extrems' Game Boy Interface. Game Boy Interface takes an input log on a gamecube, with inputs timestamped relative to GBA audio samples, and sends those inputs to the Game Boy Player. You can find the available input logs at runs.tas.bot, with more about how the dumps were made under Console Guides - GBC and in this pastebin. The Game Boy Player attachment for the GameCube is also the main device used to record RTA speedruns of GB/GBC/GBA games. Therefore to maintain console accuracy for the purposes of the TASing scene, an emulator needs to be able to match the behavior of the GBA's Game Boy Color mode, as opposed to the behavior of the original "brick" DMG or the original Game Boy Color (the main difference being a BIOS change in the GBA-GBC mode). This mode was recently detailed in a video from Modern Vintage Gamer.
The below images are output from a continuous automation system taking the latest updates to Gambatte Speedrun and piping back in GBI input logs of known console verified TASes. Gambatte Speedrun is currently "passing" 41 of 46 verified TASes below. 36 of the 41 passing TASes are being read from GBI input logs, with the 5 others coming from known syncing TASes using their original BizHawk bk2 instead of the GBI dump.
Two failures are coming from the "Hammerin' Harry" games, which were originally made in GBHawk, BizHawk's other main accuracy-focused GB TASing core. Many GBHawk TASes are syncing with their GBI dumps in this manner, but GBHawk may implement some behavior that Gambatte Speedrun does not in these games. Three other failures are coming from Oddworld Adventures II, Operation C, and Winnie The Pooh, Adventures in the 100 Acre Wood. Winnie the Pooh has a known syncing movie from Gambatte, but something about the dump from the published TAS isn't lining up. Oracle of Ages and Seasons are both passing, but sadly the timing for the image dump doesn't line up well for Seasons and the passing screen is blank white. Of particular note among the passes are Pokemon Gold and Crystal, which require an emulator to have an accurate RTC implementation with tunable Cart RTC clock offsets to match real cartridges (BizHawk's bk2 stores the cart offset for use here).