How is this an "optimization" if the compiled result is incorrect? Why would you...

Negitivefrags · 2025-12-29T20:07:37 1767038857

It’s not incorrect.

The code says that if x is true then a=13 and if it is false than b=37.

This is the case. Its just that a=13 even if x is false. A thing that the code had nothing to say about, and so the compiler is free to do.

foltik · 2025-12-29T21:27:47 1767043667

Ok, so you’re saying it’s “technically correct?”

Practically speaking, I’d argue that a compiler assuming uninitialized stack or heap memory is always equal to some arbitrary convenient constant is obviously incorrect, actively harmful, and benefits no one.

publicdebates · 2025-12-29T21:33:31 1767044011

In this example, the human author clearly intended mutual exclusivity in the condition branches, and this optimization would in fact destroy that assumption. That said, (a) human intentions are not evidence of foolproof programming logic, and often miscalculate state, and (b) the author could possibly catch most or all errors here when compiling without optimizations during debugging phase.

foltik · 2025-12-29T21:54:54 1767045294

Regardless of intention, the code says this memory is uninitialized.

I take issue with the compiler assuming anything about the contents of that memory; it should be a black box.

masklinn · 2025-12-29T22:06:06 1767045966

The compiler is the arbiter of what’s what (as long as it does not run afoul the CPU itself).

The memory being uninitialised means reading it is illegal for the writer of the program. The compiler can write to it if that suits it, the program can’t see the difference without UB.

In fact the compiler can also read from it, because it knows that it has in fact initialised that memory. And the compiler is not writing a C program and is thus not bound by the strictures of the C abstract machine anyway.

foltik · 2025-12-29T22:32:00 1767047520

Yes yes, the spec says compilers are free to do whatever they want. That doesn’t mean they should.

> The user didn’t initialize this integer. Let’s assume it’s always 4 since that helps us optimize this division over here into a shift…

This is convenient for who exactly? Why not just treat it as a black box memory load and not do further “optimizations”?

masklinn · 2025-12-29T22:41:12 1767048072

> That doesn’t mean they should.

Nobody’s stopping you from using non-optimising compilers, regardless of the strawmen you assert.

foltik · 2025-12-29T23:20:23 1767050423

As if treating uninitialized reads as opaque somehow precludes all optimizations?

There’s a million more sensible things that the compiler could do here besides the hilariously bad codegen you see in the grandparent and sibling comments.

All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that. I’m saying a spec that incentivizes this nonsense is poorly designed.

Negitivefrags · 2025-12-30T03:10:50 1767064250

Why is the code gen bad? What result are you wanting? You specifically want whatever value happened to be on the stack as opposed to a value the compiler picked?

masklinn · 2025-12-30T09:53:06 1767088386

> As if treating uninitialized reads as opaque somehow precludes all optimizations?

That's not what these words mean.

> There’s a million more sensible things

Again, if you don't like compilers leveraging UBs use a non-optimizing compiler.

> All I’ve heard amounts to “but it’s allowed by the spec.” I’m not arguing against that.

You literally are though. Your statements so far have all been variations of or nonsensical assertions around "why can't I read from uninitialised memory when the spec says I can't do that".

> I’m saying a spec that incentivizes this nonsense is poorly designed.

Then... don't use languages that are specified that way? It's really not that hard.

foltik · 2025-12-30T16:05:08 1767110708

From the LLVM docs [0]:

> Undef values aren't exactly constants ... they can appear to have different bit patterns at each use.

My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.

The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true. Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?

And to be explicit: “if you don’t like it, don’t use it” is just refusing to engage, not a constructive response to this critique. These semantics aren't set in stone.

[0] https://llvm.org/doxygen/classllvm_1_1UndefValue.html#detail...

masklinn · 2025-12-30T17:51:15 1767117075

> My claim is simple and narrow: compilers should internally model such values as unspecified, not actively choose convenient constants.

An assertion you have provided no utility or justification for.

> The comment I replied to cited an example where an undef is constant folded into the value required for a conditional to be true.

The comment you replied to did in fact not do that and it’s incredible that you misread it such.

> Can you point to any case where that produces a real optimization benefit, as opposed to being a degenerate interaction between UB and value propagation passes?

The original snippet literally folds a branch and two stores into a single store, saving CPU resources and generating tighter code.

> this critique

Critique is not what you have engaged in at any point.

foltik · 2025-12-31T05:04:27 1767157467

Sorry, my earlier comments were somewhat vague and assuming we were on the same page about a few things. Let me be concrete.

The snippet is, after lowering:

  if (x)
    return { a = 13, b = undef }
  else
    return { a = undef, b = 37 }

LLVM represents this as a phi node of two aggregates:

  a = phi [13, then], [undef, else]
  b = phi [undef, then], [37, else]

Since undef isn’t “unknown”, it’s “pick any value you like, per use”, InstCombine is allowed to instantiate each undef to whatever makes the expression simplest. This is the problem.

  a = 13
  b = 37

The branch is eliminated, but only because LLVM assumes that those undefs will take specific arbitrary values chosen for convenience (fewer instructions).

Yes, the spec permits this. But at that point the program has already violated the language contract by executing undefined behavior. The read is accidental by definition: the program makes no claim about the value. Treating that absence of meaning as permission to invent specific values is a semantic choice, and precisely what I am criticizing. This “optimization” is not a win unless you willfully ignore the program and everything but instruction count.

As for utility and justification: it’s all about user experience. A good language and compiler should preserve a clear mental model between what the programmer wrote and what runs. Silent non-local behavior changes (such as the one in the article) destroy that. Bugs should fail loudly and early, not be “optimized” away.

Imagine if the spec treated type mismatches the same way. Oops, assigned a float to an int, now it’s undef. Let’s just assume it’s always 42 since that lets us eliminate a branch. That’s obviously absurd, and this is the same category of mistake.

publicdebates · 2026-01-02T11:42:57 1767354177

It's the same as this:

    int random() {
        return 4; // chosen by dice roll
    }

Technically correct. But not really.

1718627440 · 2025-12-29T22:31:43 1767047503

Also even without UB, even for a naive translation, a could just happen to be 13 by chance, so the behaviour isn't even an example of nasal demons.

throwatdem12311 · 2025-12-29T21:20:05 1767043205

Because a could be 13 even if x is false because initialisation of the struct doesn’t have defined behavior of what the initial values of a and b need to be.

Same for b. If x is true, b could be 37 no matter how unlikely that is.

xboxnolifes · 2025-12-29T22:37:42 1767047862

It is not incorrect. The values are undefined, so the compiler is free to do whatever it want to do with them, even assign values to them.

tehjoker · 2025-12-29T20:07:12 1767038832

It's not incorrect. Where is the flaw?