Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To quote a friend; "Glibc is a waste of a perfectly good stable kernel ABI"




Kind of funny to realize, the NT kernel ABI isn’t even all that stable itself; it is just wrapped in a set of very stable userland exposures (Win32, UWP, etc.), and it’s those exposures that Windows executables are relying on. A theoretical Windows PE binary that was 100% statically linked (and so directly contained NT syscalls) wouldn’t be at-all portable between different Windows versions.

Linux with glibc is the complete opposite; there really does exist old Linux software that static-links in everything down to libc, just interacting with the kernel through syscalls—and it does (almost always) still work to run such software on a modern Linux, even when the software is 10-20 years old.

I guess this is why Linux containers are such a thing: you’re taking a dynamically-linked Linux binary and pinning it to a particular entire userland, such that when you run the old software, it calls into the old glibc. Containers work, because they ultimately ground out in the same set of stable kernel ABI calls.

(Which, now that I think of it, makes me wonder how exactly Windows containers work. I’m guessing each one brings its own NTOSKRNL, that gets spun up under HyperV if the host kernel ABI doesn’t match the guest?)


IIRC, Windows containers require that the container be built with a base image that matches the host for it to work at all (like, the exact build of Windows has to match). Guessing that’s how they get a ‘stable ABI’.

…actually, looks like it’s a bit looser these days. Version matrix incoming: https://learn.microsoft.com/en-us/virtualization/windowscont...


The ABI was stabilised for backwards compatibility since Windows Server 2022, but is not stable for earlier releases.

Apparently there are 3 kinds of Windows containers, one using HyperV, and the others sharing the kernel (like Linux containers)

https://thomasvanlaere.com/posts/2021/06/exploring-windows-c...


> Kind of funny to realize, the NT kernel ABI isn’t even all that stable itself

This is not a big problem if it's hard/unlikely enough to write a code that accidentally relies on raw syscalls. At least MS's dev tooling doesn't provide an easy way to bypass the standard DLLs.

> makes me wonder how exactly Windows containers work

I guess containers do the syscalls through the standard Windows DLLs like any regular userspace application. If it's a Linux container on Windows, probably the WSL syscalls, which I guess, are stable.


> NT kernel ABI isn’t even all that stable itself

Can you give an example where a breaking change was introduced in NT kernel ABI?


https://j00ru.vexillium.org/syscalls/nt/64/

(One example: hit "Show" on the table header for Win11, then use the form at the top of the page to highlight syscall 8c)


Changes in syscall numbers aren't necessarily breaking changes as you're supposed to use ntdll.dll to call kernel, not direct syscalls.

That was his point exactly.


The syscall numbers change with every release: https://j00ru.vexillium.org/syscalls/nt/64/

Syscall numbers shouldn't be a problem if you link against ntdll.dll.

So now you're talking about the ntdll.dll ABI instead of the kernel ABI. ntdll.dll is not the kernel.

NTDLL is NT’s kernel ABI, not syscalls. Nothing on Windows uses syscalls to call the kernel.

NTDLL isn’t some higher level library. It’s just a series of entry points into NT kernel.


Yes, the fact that functions in NTDLL issue a syscall instruction is a platform-specific implementation detail.

...isn't that the point of this entire subthread? The kernel itself doesn't provide the stable ABI, userland code that the binary links to does.

No. On NT, kernel ABI isn't defined by the syscalls but NTDLL. Win32 and all other APIs are wrappers on top of NTDLL, not syscalls. Syscalls are how NTDLL implements kernel calls behind the scenes, it's an implementation detail. Original point of the thread was about Win32, UWP and other APIs that build a new layer on top of NTDLL.

I argue that NT doesn't break its kernel ABI.


NTDLL APIs are very stable[0] and you can even compile and run x86 programs targeting NT 3.1 Build 340[1] which will still work on win11.

[0] as long as you don't use APIs they decided to add and remove in a very short period (longer read: https://virtuallyfun.com/2009/09/28/microsoft-fortran-powers...)

[1] https://github.com/roytam1/ntldd/releases/tag/v250831


macOS and iOS too — syscalls aren’t stable at all, you’re expected to link through shared library interfaces.

> No

...and you go on to not disagree with me at all? Why comment then?


Docker on windows isn't simply a glorified virtual machine running a Linux. aka Linux subsystem v2

At least glibc uses versioned symbols. Hundreds of other widely-used open source libraries don't.

Versioned glibc symbols are part of the reason that binaries aren't portable across Linux distributions and time.

Only because people aren't putting in the effort to build their binaries properly. You need to link against the oldest glibc version that has all the symbols you need, and then your binary will actually work everywhere(*).

* Except for non-glibc distributions of course.


> Only because people aren't putting in the effort to build their binaries properly.

Because Linux userland is an unmitigated clusterfuck of bad design that makes this really really really hard.

GCC/Clang and Glibc make it effectively impossible almost impossible to do this on their own. The only way you can actually do this is:

1. create a userland container from the past 2. use Zig which moved oceans and mountains to make it somewhat tractable

It's awful.


We are using Nix to do this. It’s only a few lines of code. We build a gcc 14 stdenv that uses an old glibc.

But I agree that this should just be a simple target SDK flag.

I think the issue is that the Linux community is generally hostile towards proprietary software and it’s less of an issue for FLOSS because they can always be compiled against the latest.


But to link against an old glibc version, you need to compile on an old distro, on a VM. And you'll have a rough time if some part of the build depends on a tool too new for your VM. It would be infinitely simpler if one could simply 'cross-compile' down to older symbol versions, but the tooling does not make this easy at all.

Check out `zig cc`. It let's you target specific glibc versions. It's a pretty amazing C toolchain.

https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...


It's actually doable without an old glibc as it was done by the Autopackage project: https://github.com/DeaDBeeF-Player/apbuild

That never took off though, containers are easier. Wirh distrobox and other tools this is quite easy, too.


> It would be infinitely simpler if one could simply 'cross-compile' down to older symbol versions, but the tooling does not make this easy at all.

It's definitely not easy, but it's possible: using the `.symver` assembly (pseudo-)directive you can specify the version of the symbol you want to link against.


Huh? Bullshit. You could totally compile and link in a container.

Ok, so you agree with him except where he says “in a VM” because you say you can also do it “in a container”.

Of course, you both leave out that you could do it “on real hardware”.

But none of this matters. The real point is that you have to compile on an old distro. If he left out “in a VM”, you would have had nothing to correct.


I'm not disagreeing that glibc symbol versioning could be better. I raised it because this is probably one of the few valid use cases for containers where they would have a large advantage over a heavyweight VM.

But it's like complaining that you might need a VM or container to compile your software for Win16 or Win32s. Nobody is using those anymore. Nor really old Linux distributions. And if they do, they're not really going to complain about having to use a VM or container.

As C/C++ programmer, the thing I notice is ... the people who complain about this most loudly are the web dev crowd who don't speak C/C++, when some ancient game doesn't work on their obscure Arch/Gentoo/Ubuntu distribution and they don't know how to fix it. Boo hoo.

But they'll happily take a paycheck for writing a bunch of shit Go/Ruby/PHP code that runs on Linux 24/7 without downtime - not because of the quality of their code, but due to the reliability of the platform at _that_ particular task. Go figure.


> But they'll happily take a paycheck for writing a bunch of shit Go/Ruby/PHP code that runs on Linux 24/7 without downtime - not because of the quality of their code, but due to the reliability of the platform at _that_ particular task.

But does the lack of a stable ABI have any (negative) effect on the reliability of the platform?


Only for people who want to use it as a desktop replacement for Windows or MacOS I guess? There are no end of people complaining they can't get their wifi or sound card or trackpad working on (insert-obscure-Linux-distribution-here).

Like many others, I have Linux servers running over 2000-3000 days uptime. So I'm going to say no, it doesn't, not really.


>As C/C++ programmer, the thing I notice is ... the people who complain about this most loudly are the web dev crowd who don't speak C/C++, when some ancient game doesn't work on their obscure Arch/Gentoo/Ubuntu distribution and they don't know how to fix it. Boo hoo.

You must really be behind the times. Arch and Gentoo users wouldn't complain because an old game doesn't run. In fact the exact opposite would happen. It's not implausible for an Arch or Gentoo user to end up compiling their code on a five hour old release of glibc and thereby maximize glibc incompatibility with every other distribution.



If it requires effort to be correct, that's a bad design.

Why doesn't the glibc use the version tag to do the appropriate mapping?


I think even calling it a "design" is dubious. It's an attribute of these systems that arose out of the circumstance, nobody ever sat down and said it should be this way. Even Torvalds complaining about it doesn't mean it gets fixed, it's not analogous to Steve Jobs complaining about a thing because Torvalds is only in charge of one piece of the puzzle, and the whole image that emerges from all these different groups only loosely collaborating with each other isn't going to be anybody's ideal.

In other words, the Linux desktop as a whole is a Bazaar, not Cathedral.


> In other words, the Linux desktop as a whole is a Bazaar, not Cathedral.

This was true in the 90s, not the 2020s.

There are enough moneyed interests that control the entirety of Linux now. If someone at Canonical or Red Hat thought a glibc version translation layer (think WINE, but for running software targeted for Linux systems made more than the last breaking glibc version) was a good enough idea, it could get implemented pretty rapidly. Instead of win32+wine being the only stable abi on Linux, Linux could have the most stable abi on Linux.


I don’t understand why this is the case, and would like to understand. If I want only functions f1 and f2 which were introduced in glibc versions v1 and v2, why do I have to build with v2 rather than v3? Shouldn’t the symbols be named something like glibc_v1_f1 and glibc_v2_f2 regardless of whether you’re compiling against glibc v2 or glibc v3? If it is instead something like “compiling against vN uses symbols glibc_vN_f1 and glibc_vN_f2” combined with glibc v3 providing glibc_v1_f1, glibc_v2_f1, glibc_v3_f1, glibc_v2_f2 and glbc_v3_f2… why would it be that way?

> why would it be that way?

It allows (among other things) the glibc developers to change struct layouts while remaining backwards compatible. E.g. if function f1 takes a struct as argument, and its layout changes between v2 and v3, then glibc_v2_f1 and glibc_v3_f1 have different ABIs.


Individual functions may have a lot of different versions. They do only update them if there is an ABI change (so you may have e.g. f1_v1, f1_v2, f2_v2, f2_v3 as synbols in v3 of glibc) but there's no easy way to say 'give me v2 of every function'. If you compile against v3 you'll get f2_v3 and f1_v2 and so it won't work on v2.

Why are they changing? And I presume there must be disadvantages to staying on the old symbols, or else they wouldn’t be changing them—so what are those disadvantages?

> You need to link against the oldest glibc version that has all the symbols you need

Or at least the oldest one made before glibc's latest backwards incompatible ABI break.


Yeah and nothing ever lets you pick which versions to link to. You're going to get the latest ones and you better enjoy that. I found it out the hard way recently when I just wanted to do a perfectly normal thing of distributing precompiled binaries for my project. Ended up using whatever "Amazon Linux" is because it uses an old enough glibc but has a new enough gcc.

You can choose the version. There was apgcc from the (now dead) Autopackage project which did just that: https://github.com/DeaDBeeF-Player/apbuild

It's not at all straightforward, it should be the kind of thing that's just a compiler flag, as opposed to needing to restructure your build process to support it.

Yeah that's what I meant. I also came across some script with redefinitions of C standard library functions that supposedly also allows you to link against older glibc symbols. I couldn't make it work.

Any half-decent SDK should allow you to trivially target an older platform version, but apparently doing trivial-seeming things without suffering is not The Linux Way™.


> Hundreds of other widely-used open source libraries don't.

Correct me if I'm wrong but I don't think versioned symbols are a thing on Windows (i.e. they are non-portable). This is not a problem for glibc but it is very much a problem for a lot of open source libraries (which instead tend to just provide a stable C ABI if they care).


> versioned symbols are a thing on Windows

There’re quite a few mechanics they use for that. The oldest one, call a special API function on startup like InitCommonControlsEx, and another API functions will DLL resolve differently or behave differently. A similar tactic, require an SDK defined magic number as a parameter to some initialization functions, different magic numbers switching symbols from the same library; examples are WSAStartup and MFStartup.

Around Win2k they did side by side assemblies or WinSxS. Include a special XML manifest into embedded resource of your EXE, and you can request specific version of a dependent API DLL. The OS now keeps multiple versions internally.

Then there’re compatibility mechanics, both OS builtin and user controllable (right click on EXE or LNK, compatibility tab). The compatibility mode is yet another way to control versions of DLLs used by the application.

Pretty sure there’s more and I forgot something.


> There’re quite a few mechanics they use for that. The oldest one, call a special API function on startup [...]

Isn't the oldest one... to have the API/ABI version in the name of your DLL? Unlike on Linux which by default uses a flat namespace, on the Windows land imports are nearly always identified by a pair of the DLL name and the symbol name (or ordinal). You can even have multiple C runtimes (MSVCR71.DLL, MSVCR80.DLL, etc) linked together but working independently in the same executable.


Linux can do this as well, the issue is that just duplicates how many versions you need to have installed, and it's not that different in the limit from having a container anyway. The symbol versioning means you can just have the latest version of the library and it remains compatible with software built against old versions. (Especially because when you have multiple versions of a library linked into the same process you can wind up with all kinds of tricky behaviour if they aren't kept strictly separated. There's a lot of footguns in Windows around this, especially with the way DLLs work to allow this kind of seperation in the first place).

I did forget to mention something important. Since about Vista, Microsoft tends to replace or supplement C WinAPI with IUnknown based object-oriented ones. Note IUnknown doesn’t necessarily imply COM; for example, Direct3D is not COM: no IDispatch, IPC, registration or type libraries.

IUnknown-based ABIs exposing methods of objects without any symbols exported from DLLs. Virtual method tables are internal implementation details, not public symbols. By testing SDK-defined magic numbers like SDKVersion argument of D3D11CreateDevice factory function, the DLL implementing the factory function may create very different objects for programs built against different versions of Windows SDK.


There’s also API Sets: where DLLs like api-win-blah-1.dll acts as a proxy for another DLL both literally, with forwarder exports, and figuratively, with a system-wide in-memory hashmap between api set and actual DLL.

Iirc this is both for versioning, but also so some software can target windows and Xbox OS’s whilst “importing” the same api-set DLL? Caused me a lot of grief writing a PE dynamic linker once.

https://bookkity.com/article/api-sets


I only learned about glibc earlier today, when I was trying to figure out why the Nix version of a game crashes on SteamOS unless you unset some environ vars.

Turns out that Nix is built against a different version of glibc than SteamOS, and for some reason, that matters. You have to make sure none of Steam's libraries are on the path before the Nix code will run. It seems impractical to expect every piece of software on your computer to be built against a specific version of a specific library, but I guess that's Linux for you.


No, that's every bit of software out there. Dynamic linking really does cause that problem even though allegedly it has security benefits as the vendor is able to patch software vulnerabilities.

NixOS actually is a bit better in this respect since most things are statically linked. The only thing is that glibc is not because it specifically requires being dynamically linked.

This issue also applies to macOS with their Dylibs and also Windows with their DLLs. So saying that this is an issue with Linux is a bit disingenuous.

Until everybody standardizes on one singular executable format that doesn't ever change, this will forever be an issue.


Ask your friend if he would CC0 the quote or similar (not sure if its possible but like) I can imagine this being a quote on t-shirts xD

Honestly I might buy a T-shirt with such a quote.

I think glibc is such a pain that it is the reason why we have so vastly different package management and I feel like non glibc things really would simplify the package management approach to linux which although feels solved, there are definitely still issues with the approach and I think we should still all definitely as such look for ways to solve the problem


Non-glibc distros (musl, uclibc...) with package managers have been a thing for ages already.

And they basically hold under 0.01% of Linux marketshare and are completely shit.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: