MIO0 Encoder Optimization Notes

Size Optimizations

Careful Matching

abbcc

If ab is compressed, two bytes (c) are left uncompressed. However, if bc is compressed only one byte (a) is left uncompressed.
Example: boasting coats boats

Careless encoding:
Careful encoding:
		

Section Overlap Hack

In some scenarios, a few bits from the beginning of the pairs section can also be used as the last bits of the control section. This can save two bytes that would otherwise be required for a new set of control bits.
Example: aaabcdefghijklmnaaa

Normal encoding:

   00       04       08       0C
00 4D494F30 00000013 00000014 00000016  MIO0............
10 FFFF0000 000F6161 61626364 65666768  ÿÿ....aaabcdefgh
20 696A6B6C 6D6E                        ijklmn

Hacky encoding:

   00       04       08       0C
00 4D494F30 00000013 00000012 00000014  MIO0............
10 FFFF000F 61616162 63646566 6768696A  ÿÿ..aaabcdefghij
20 6B6C6D6E                             klmn
		

Implementation strength tests

Sequence repetition

abcabcabcabcabcabcabc
Strong implementation should produce a single length-offset pair

Better matching

abc bcde abcde
Strong implementation should compress bcde instead of abc

Wasteless better matching

123 2345 34567 456789 123456789
Strong implementation should produce pairs for both 123 and 456789.
(A weak implementation might skip over 123, leaving it uncompressed.)

3 byte tail matching

abcde efg abcdefg
Strong implementation should produce pairs for abcd and efg.
This should only be used when the base match is longer than 3 characters.

Data usage reference