Thursday, September 20, 2018

On the subject of bitwise operators in C#

This subject is a bit off-band for the blog, but I figured it could also double as a status update. The first draft of the object definitions is almost complete; only Death Egg Zone remains at the time of writing. After that, I'll probably go over the entire set and make everything a little bit more consistent, add a few more overlays here and there, etc. I'm currently aiming to get everything done early next month, so we'll see how that goes.

I've also made up my mind about what the focus of my hack will be, so I can't wait to jump on that as well. It's going to be a lot of work up front, but I'm hoping the payoff is worth it. Anyway, time for a rant.


As you may or may not be aware, SonLVL is programmed in C#. Due to this, the most powerful way of writing SonLVL object definitions is to just roll your own C# code against SonLVL's public API, which SonLVL then compiles on the fly by calling up the C# compiler at runtime.

This is good! C# is a great programming language, and one which I regularly work with in my day job, so being able to transfer my existing skill set certainly makes it easier on both sides.

Now, the greatest complexity in writing object definitions comes from wrangling subtypes. Apart from the X/Y flip flags, the subtype is the only way of instructing objects to serve up a different appearance or behavior. As such, more often than not, several different properties are packed into the individual bits of the subtype byte. And therein lies the rub: performing bitwise operations in C# is just sad.


Let's take, for example, the Automatic Tunnel object. These are the high speed chutes found in Launch Base Zone and Lava Reef Zone. They have three properties, which are encoded into the subtype as follows:

  • Bits 0-4 are the Path ID, which defines the set of waypoints the player will be sent through.
  • Bit 6 is the Launch flag; if set, the player will keep their momentum at the end of the tunnel.
  • Bit 7 is the Reverse flag; if set, the player will go through the waypoints in reverse order.
  • Bit 5 is unused.

Here's the above information in graphical form, because humans love graphics:
     0  0  0  0  0  0  0  0 

   Reverse     Launch     Path ID
Alright, so now let's say I want to have a property box where the user can change the path ID, without affecting the other flags. Sounds easy enough. Just blank out the path ID bits already in the subtype, truncate the user value to five bits, and join the two together. So let's write that.
    subtype = (subtype & 0xE0) | (value & 0x1F);
Hit compile and... compilation error. An expression of type int cannot be assigned to the variable subtype, which is of type byte. Oh right, the literals 0xE0 and 0x1F are of type int, so the AND operations are lifted to int: both subtype and value get promoted from byte to int and operator &(int a, int b) is called, which itself returns int. The two resulting ints are then ORed together, so the entire expression is of type int, which cannot be assigned to a variable of type byte.

There's actually no way to write a byte literal in C#; you are expected to cast the int literal to byte. The compiler will do the right thing and not insert a conversion operation, but work with byte from the start. So let's write that.
    subtype = (subtype & (byte)0xE0) | (value & (byte)0x1F);
Hit compile, same error. As it turns out...

Pain point #1: There are no bitwise operators defined on byte


It's not the literals, it's the operators! There actually isn't such a thing as byte operator &(byte a, byte b) in C#; they go down to int and that's it. So when we write subtype & (byte)0xE0, the compiler promotes both bytes to int and then calls int operator &(int a, int b), once again resulting in a subexpression of type int.

The same thing goes for the OR operator, so no matter how we slice it, the whole expression will always evaluate to int. So the correct solution is to cast that instead:
    subtype = (byte)((subtype & 0xE0) | (value & 0x1F));
It's already getting hard to read through all the parentheses, but it's only going to get worse.

Pain point #2: Bitwise operations do not return bool


Let's turn our attention to the flags. In the case of the Reverse flag, I want the user value to be a yes/no toggle, so value is a bool. Then, depending on whether the bool is true or not, we set the relevant bit to 1 or 0. Let's write that.
    subtype = (byte)((subtype & 0x7F) | (value ? 0x80 : 0x00));
Alright, relatively painless. But what about the reverse operation, where we look up the current subtype and figure out the current state of the Launch flag? This time we're assigning to value, which is of type bool. So we write
    value = subtype & 0x80;
which again results in a compilation error, this time stating that an expression of type int cannot be assigned to a variable of type bool.

This is because in C#, unlike C and C++ before it, bools are strongly typed. They can only hold the values true and false, which alleviates the situation where 1 and 2 both mean true, but compare differently to one another. But that means there's no quick way to write a bit test in C#; one must append either != 0 or == 0x80, the former a tautology, the latter a repetition.

Now, since the Reverse flag happens to be the most significant bit, we can sidestep the issue by instead writing:
    value = subtype >= 0x80;
But in the case of the Launch flag, imagine my surprise when I write
    value = subtype & 0x40 != 0;
and I get yet another compilation error: operator & cannot be applied to operands of type byte and bool.

Pain point #3: Bitwise operators are also logical operators


If the previous point was to get rid of legacy C bullcrap, then this one enshrines it. Early versions of C did not have the logical operators && and ||, so to combine two or more equality comparisons into a single conditional expression, you would use the bitwise operators & and |, like so:
    if (day == 25 & month == 12) printf("It's Christmas!\n");
In order for this kind of expression to evaluate correctly, bitwise operations were given lower precedence than equality comparisons, so that the program would first check that the day is 25, then that the month is December, before it combines the results and decides whether it's Christmas or not. When bitwise operators were added to the C# specification, their precedence was kept the same, presumably in order to avoid "gotcha" scenarios when porting over legacy C and C++ code.

So above, when we wrote
    value = subtype & 0x40 != 0;
what the compiler actually does is check 0x40 and 0 for equality, and then attempt to combine the result with the value of subtype, which is the complete opposite of what we were trying to accomplish!

The solution is, again, to add more parentheses to the expression:
    value = (subtype & 0x40) != 0;
But here's the kicker: since in C#, equality comparisons result in bool, not int, they had to introduce separate, eager logical operators &(bool a, bool b) and |(bool a, bool b) to go along with the to the existing short-circuiting logical operators &&(bool a, bool b) and ||(bool a, bool b). So they could have avoided this whole disaster by simply giving the eager logical operators a different notation from the bitwise operators! Grrr.

With all that parenthesizing, it's no surprise that we end up with code that looks a little something like this:
properties[2] = new PropertySpec("Launch", typeof(bool), "Extended",
    "If set, the player will launch off at the end of the path.", null,
    (obj) => (obj.SubType & 0x40) != 0,
    (obj, value) => obj.SubType = (byte)((obj.SubType & 0xBF) | ((bool)value ? 0x40 : 0)));

And that's just a little bit sad.

Update 27/02/2020: Eric Lippert expands on the last point over at his own blog. This post was mostly inspired by Eric's writings there and elsewhere on the the Internet, so being able to finally link back is incredibly delightful to me.