Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have to admit I don't really understand the logic behind how compiler writers got to the point where they were able to interpret "undefined behavior" as "will not happen"

My understanding is that undefined behavior is in the spec because the various physical machines that C could compile to would each do something different. So to avoid stepping on toes the spec basically punted and said we are not going to define what happens. Originally this was fine. the C compiler would dutifully generate the machine instructions and the machine would execute them returning the result. The wrench was thrown into the works when we started getting optimizing compilers. I would argue the correct interpretation of "undefined behavior" in the face of a optimizing compiler is "unknown" but then it would sort of behave like the sql NULL(each unknown is different from every other unknown) and act as an optimization fence. When you have to ask the machine what the result of this operation is, it is hard to optimize around it ahead of time.

So my best guess as to why they decided to read "undefined" as "can not happen" is that this is the interpretation they can best optimize for. And nobody really pushed back because what the hell does "undefined" mean anyway? My read is that if the spec wanted to say "can not happen" the spec would have said "can not happen"



As I understand it, the C spec defines what are valid programs, and for valid programs, either specifies what their observable side effects must be, or leaves them either unspecified or implementation defined. Importantly, programs with undefined behaviour are excluded from the class of valid programs; thus the spec imposes no requirement on the resulting behaviour. To quote,

> Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to […]

And I think this is where “can’t happen” comes in: in the case of undefined behaviour, the compiler is free to emit whatever it pleases, including pretending it cannot happen!


"Can't happen" comes from the fact that the compiler optimizes for defined behaviour, not undefined behaviour. Making defined behaviour fast at the expense of undefined behaviour looks the same as assuming undefined behaviour can't happen in a number of cases. Eg. if you dereference a pointer then check it for null, naturally the compiler will remove the later check. We can describe that as the compiler making the defined case fast, or we could describe it as the compiler assuming the pointer "can't" be null. The latter description is easy to think about, and generalizes broadly enough, to be a common way of describing it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: