My calculator is not like yours; floating point and compilers

I was building a system called randomart which guarantees deterministic image generation for the same seed string. When my backends started disagreeing on the output, I blamed the compiler only to realise I was the one that introduced a bug. But the question I asked led me down a rabbit hole where I realised I wasn't too far off.

Blissfully unaware

Randomart is a procedural image generation system that deterministically produces images from a seed string by building an expression tree from a probabilistic context-free grammar. At its core, it's just a function that takes pixel coordinates and outputs a colour value (R, G, B channel values).

It’s important to note that the original context of the paper needed the output to be deterministic and was meant to be used for security purposes as a visual hash mechanism. The paper also mentioned using normalized values in the range [-1, 1] so I thought floating point would be the right choice as the primitive for my math operations.

I had written three backends for this system: a tree of anonymous functions (lambdas or closures), JIT compilation of the AST using cranelift, and runtime code generation of Metal kernels. My images looked similar across all three backends so I never had a doubt that my code might suffer from floating point inconsistencies.

I was disassembling and experimenting with my JIT codegen and while doing so, I wrote an incorrect implementation of the sqrt operation. The other backends would protect against negative inputs by setting the output to zero, while the JIT implementation did not. This led to sqrt(negative number) evaluating to NaN, which propagated across the entire expression tree and set entire channels to NaN. I had corrupted the data for certain channels because of this.

The correct backends used sqrt(x).max(0.0); IEEE 754-2008 defines max(NaN, x) = x, so the NaN that sqrt produces for negative inputs gets silently swallowed and replaced with zero. The JIT was missing this call entirely.

Before I figured that there was an issue with my implementation, I looked into cranelift’s floating point inaccuracies, wondering whether the compiler was screwing me over, as any well-adjusted individual with a normal-sized ego would. It didn’t take long to realise that I was incorrect in my assumption that cranelift was wrong, but I wasn't incorrect in thinking that heterogeneous compilers and hardware can introduce numerical inconsistencies.

The Research I Did

IEEE 754 mandates that certain operations return the closest representable floating point value to the infinite-precision result; this is called correct rounding and most compilers adhere to this. This mandate doesn't extend to transcendental functions (like ln(), acos()). Rust doesn't guarantee deterministic output for such functions because it defers the math to the machine's libm. I had been consistent by luck as my backends always used the libm on my machine, while on someone else's machine, the same image could look different.

Metal enables fast math by default! I read about it in a WebGPU thread where the author faced errors of several thousand percent. Metal makes no promise of respecting numerical consistency because they expect programmers to use GPUs for faster math, not correct math. The solution would have been to write custom kernels for f32, which requires f64 to store intermediate values. Unfortunately, Metal does not have native f64 (Heading 2.1 Scalar Data Types), which would mean I would have had to resort to software-emulated f64, which is extremely slow. The Metal backend for my system makes no correctness guarantees, and just warns that the math might look different from the CPU implementations.

Painfully aware

The libm inconsistency problem has a solution: CORE-MATH. It is a library of math functions with correct rounding implementations of an extended set of floating point math operations. The CPU implementations for my system package the CORE-MATH library and link against it when building the binary. It means that my CPU binaries ship with their own math, and require a C compiler to be present on the target machine, which is ubiquitous enough.

Some hardware and compilers flush subnormal values to zero (FTZ). If a subnormal feeds into a subsequent operation, you may get a different result depending on whether FTZ is active, and you would have no idea. The defensive design here was to explicitly disable FTZ at the start of evaluation by writing to the MXCSR register on x86 and FPCR on ARM:

pub unsafe fn disable_ftz() {
    #[cfg(target_arch = "x86_64")]
    {
        let mut mxcsr: u32;
        std::arch::asm!("stmxcsr [{0}]", out(reg) mxcsr, options(nostack));
        std::arch::asm!("ldmxcsr [{0}]", in(reg) &(mxcsr & !0x8040), options(nostack));
    }
    #[cfg(target_arch = "aarch64")]
    {
        let mut fpcr: u64;
        std::arch::asm!("mrs {0}, fpcr", out(reg) fpcr, options(nostack));
        std::arch::asm!("msr fpcr, {0}", in(reg) fpcr & !(1 << 24), options(nostack));
    }
}

Division outputs are clamped to [-1, 1] so the result can never become unbounded. The natural exponent is also dangerous: exp(x) grows unboundedly for positive inputs, and since tree nodes operate in [-1, 1], it's easy to chain values that push exp to infinity. Infinity in a float propagates to NaN, which corrupts everything downstream. So exp inputs are clamped too. With every node clamped at its dangerous boundary, inf and NaN can never enter the tree.

Trees can be serialised to JSON and read back to reconstruct the original tree. To hold the guarantee of bit-identical images, the string encoding of the floating point numbers in the JSON need to be round-trip safe. I used the serde_json crate which internally uses the Ryu algorithm. This algorithm gives the smallest possible string encoding of a float that deserialises into the exact bit representation of the float that had been serialised.

My calculator really isn't like yours, but for now, I have tried my hardest.