Practically speaking, I’d argue that a compiler assuming uninitialized stack or heap memory is always equal to some arbitrary convenient constant is obviously incorrect, actively harmful, and benefits no one.
In this example, the human author clearly intended mutual exclusivity in the condition branches, and this optimization would in fact destroy that assumption. That said, (a) human intentions are not evidence of foolproof programming logic, and often miscalculate state, and (b) the author could possibly catch most or all errors here when compiling without optimizations during debugging phase.
The compiler is the arbiter of what’s what (as long as it does not run afoul the CPU itself).
The memory being uninitialised means reading it is illegal for the writer of the program. The compiler can write to it if that suits it, the program can’t see the difference without UB.
In fact the compiler can also read from it, because it knows that it has in fact initialised that memory. And the compiler is not writing a C program and is thus not bound by the strictures of the C abstract machine anyway.
As if treating uninitialized reads as opaque somehow precludes all optimizations?
There’s a million more sensible things that the compiler could do here besides the hilariously bad codegen you see in the grandparent and sibling comments.
All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that. I’m saying a spec that incentivizes this nonsense is poorly designed.
Why is the code gen bad? What result are you wanting? You specifically want whatever value happened to be on the stack as opposed to a value the compiler picked?
> As if treating uninitialized reads as opaque somehow precludes all optimizations?
That's not what these words mean.
> There’s a million more sensible things
Again, if you don't like compilers leveraging UBs use a non-optimizing compiler.
> All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that.
You literally are though. Your statements so far have all been variations of or nonsensical assertions around "why can't I read from uninitialised memory when the spec says I can't do that".
> I’m saying a spec that incentivizes this nonsense is poorly designed.
Then... don't use languages that are specified that way? It's really not that hard.
> Undef values aren't exactly constants ... they can appear to have different bit patterns at each use.
My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.
The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true. Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?
And to be explicit: “if you don’t like it, don’t use it” is just refusing to engage, not a constructive response to this critique. These semantics aren't set in stone.
> My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.
An assertion you have provided no utility or justification for.
> The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true.
The comment you replied to did in fact not do that and it’s incredible that you misread it such.
> Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?
The original snippet literally folds a branch and two stores into a single store, saving CPU resources and generating tighter code.
> this critique
Critique is not what you have engaged in at any point.
Sorry, my earlier comments were somewhat vague and assuming we were on the same page about a few things. Let me be concrete.
The snippet is, after lowering:
if (x)
return { a = 13, b = undef }
else
return { a = undef, b = 37 }
LLVM represents this as a phi node of two aggregates:
a = phi [13, then], [undef, else]
b = phi [undef, then], [37, else]
Since undef isn’t “unknown”, it’s “pick any value you like, per use”, InstCombine is allowed to instantiate each undef to whatever makes the expression simplest. This is the problem.
a = 13
b = 37
The branch is eliminated, but only because LLVM assumes that those undefs will take specific arbitrary values chosen for convenience (fewer instructions).
Yes, the spec permits this. But at that point the program has already violated the language contract by executing undefined behavior. The read is accidental by definition: the program makes no claim about the value. Treating that absence of meaning as permission to invent specific values is a semantic choice, and precisely what I am criticizing. This “optimization” is not a win unless you willfully ignore the program and everything but instruction count.
As for utility and justification: it’s all about user experience. A good language and compiler should preserve a clear mental model between what the programmer wrote and what runs. Silent non-local behavior changes (such as the one in the article) destroy that. Bugs should fail loudly and early, not be “optimized” away.
Imagine if the spec treated type mismatches the same way. Oops, assigned a float to an int, now it’s undef. Let’s just assume it’s always 42 since that lets us eliminate a branch. That’s obviously absurd, and this is the same category of mistake.
Because a could be 13 even if x is false because initialisation of the struct doesn’t have defined behavior of what the initial values of a and b need to be.
Same for b. If x is true, b could be 37 no matter how unlikely that is.