Thursday, June 1, 2017

Mega Drive compression formats

The Mega Drive has a 32 megabit (4 MB) ROM address space. That's already pretty small for today's standards, but since larger cartridges were more expensive to produce, most games made were even smaller than that. Sonic 3 and Sonic & Knuckles each weigh in at 16 megabits (2 MB). Sonic 2 takes up 8 megabits (1 MB). Meanwhile, Sonic 1 is a scant 4 megabits (512 KB).

Faced with such a reality, it was vitally important to find ways of saving space and squeezing the most out of the limited storage space. Sonic 3 uses three different compression schemes, known now as Kosinski, Nemesis and Enigma.

Kosinski compression is a general-purpose compression format based on the LZSS algorithm. It reduces data through dictionary encoding, meaning it creates a dictionary of frequent patterns in the data and replaces occurrences of those patterns with references to their entry in the dictionary. The compression rate isn't very high, but the decompression is blazingly fast and it can be used on all sorts of data.

Nemesis compression is a format specifically for compressing Mega Drive graphics. It combines entropy encoding with run-length encoding at the nybble level to reduce data by associating strings of repeated nybbles with prefix-free codes and replacing the most frequent strings with the shortest codes. The compression rate is high, and it can decompress directly to VRAM, however the slow speed limits it to just a few tiles per frame.

Enigma compression is used to compress VDP patterns. It hoists the patterns' flags and combines address relocation with run-length encoding to compress the VRAM offsets. It's mainly used to compress background plane mappings for large, static screens, such as menus or the Special Stage screen.

Kosinski compression is named after Brett Kosinski, who originally reverse-engineered the format. Nemesis is named after... Nemesis, also the one who cracked the format. Nemesis would also end up cracking the Enigma format, which he named Enigma because I guess the old naming scheme didn't work anymore!

Past Sonic games used the Nemesis format to compress all the main level art, but the slow decompression would have made Sonic 3's seamless act transitions impossible. The solution: introduce a variation on the Kosinski format, known as Kosinski Moduled or KosM for short. In the KosM format, the source data is first broken into $1000 byte pieces, and each piece is compressed using the Kosinski algorithm. Doing so allows part of the data to be decompressed to a RAM buffer and immediately DMA'd to VRAM, freeing up the buffer for the next piece in line.