However, as I also mentioned, the public disassembly contains a special build flag, the Sonic3_Complete flag, which according to the disassembly's README file...
[...] will change the assembly process, incorporating all necessary Sonic 3 data split from the original rom to create a complete Sonic 3 and Knuckles rom without filler.Indeed, setting this flag causes all Sonic 3 data that isn't referenced by Sonic & Knuckles to be trimmed off, producing a lean, 3.23 MB ROM. Can we get it even smaller, though?
As an initial approach, I tried repacking all of the compressed assets in the game using flamewing's optimized suite of compressors. These are able to output smaller versions of the same files, but which the game's Kosinski and Nemesis decompressors can unpack all the same. Doing so brings the ROM size down to 3.16 MB.
We can do better, though. As you may recall, the Sonic 3 cartridge contains a small subset of Knuckles' sprites that are used during his cutscene appearances. In Sonic 3 & Knuckles, these sprites are still used for the cutscenes which take place in Sonic 3 stages, so the Sonic3_Complete flag includes the cutscene art block in the ROM.
Much like the player object, cutscene Knuckles uses DPLCs to ensure that only one of its sprites needs to be loaded to VRAM at a time. This requires that all of his sprites be stored uncompressed in the ROM, which takes up a significant amount of space. All told, Knuckles takes up about 19.7 KB of the Sonic 3 cartridge.
However, since player sprites are also uncompressed, this presents an opportunity. If we take a close look at the above sprites, we can see that most of them are identical to sprites which can be found in the main Knuckles art block. In fact, only nine sprites along the bottom row are unique to the cutscene art block.
Once isolated, these sprites only take up about 4.3 KB. If we add them to the main Knuckles art block, and then simply edit cutscene Knuckles' DPLC scripts to point at the identical sprites in the main art block, we can lose the cutscene art block entirely and save about 15.4 KB.
The main art block is exactly 130,944 bytes. Adding the sprites above makes it 135,360 bytes. That's just over 132 KB.
Uh-oh.
To understand why this is a problem, let's look at the documentation for the DPLC format over at the Sonic Retro wiki:
Sonic 3 & Knuckles uses the same DPLC format as Sonic 2 for the main characters: the first word is the number of DPLC requests to make, and each successive word (up to the value of the first word) is split up so that the first nybble is the number of tiles to load minus one, and the last three nybbles are the offset (in tiles, i.e. multiples of $20 bytes) of the art to load from the beginning of the object's specified art offset in ROM.To put it differently, each DPLC entry is a word in the format $NAAA, in which N is the number of tiles to copy to VRAM (minus one), and AAA is the the number of the tile to start copying from. So for instance, if we wanted to queue up tiles $3E2 through $3E7, that's six tiles starting from $3E2, so we would write down $53E2.
This format can only index up to $1000 tiles, and because each tile is $20 bytes long, that means the art block can't be larger than $1000 × $20 = $20000 bytes, or 128 KB.
So now we have to go into the art block and somehow save up 4 KB of tiles in order to add in the 4.3 KB of sprites from the cutscene block. Thankfully, there are a lot of poorly optimized sprites to aid us in our task.
For instance, in the teetering sprites above, there's a lot of empty space, and the shoe on the floor appears three times despite being identical in each frame. By nudging Knuckles around slightly, we can reduce the size of the sprite pieces and reuse the shoe for every frame, saving about a dozen tiles. Rinse and repeat for all 252 of Knuckles' frames.
You do the impossible and fit everything into 128 KB. That should work, right? Just build the ROM and-- oh, bloody hell.
Assuming you actually followed that link to the Sonic Retro wiki, you may have noticed that the DPLC format is different between Player Characters and Other Objects. Here's what it says in the latter case:
For other objects, the first word is the number of DPLC requests to make minus one, and each successive word (up to the value of the first word plus one) is split up so that the last nybble is the number of tiles to load minus one, and the first three nybbles are the offset (in tiles, i.e. multiples of $20 bytes) of the art to load from the beginning of the object's specified art offset in ROM.Cutscene Knuckles isn't a player character, so by exclusion, he must be an "other object". So what, that just means the DPLC entry format is $AAAN rather than $NAAA, right?
Not so fast. In order to calculate the ROM offset for the DMA transfer, the DPLC code must take the starting tile number AAA and multiply it by $20, the size of each tile. Let's look at how the code in Knuckles_Load_PLC does it:
andi.w #$FFF,d1 lsl.l #5,d1Pretty straightforward. The bottom 24 bits are isolated, and then promoted to a longword as the value is shifted left five times, multiplying it by $20. Now let's look at the general purpose code in Perform_DPLC:
andi.w #$FFF0,d1 ; Isolate all but lower 4 bits add.w d1,d1This time around, the tile number is in the top 24 bits, so it is already multiplied by $10 and only needs to be multiplied by two in order to calculate the offset. This is done by adding by adding the contents of the d1 register back onto itself, without promoting the value to a longword.
There are two consequences to this. One is that the d1 register does not have to be cleared before loading each new DPLC entry. The other is that tiles past the 64 KB boundary are inaccessible: since the topmost bit is always truncated, a request for tile $9B9 results in tile $1B9 being loaded instead.
At this point I just shrugged, moved all the cutscene sprites to the first half of the art block, and called it a day.