Each socket should have two file descriptors

pierrebai · on July 22, 2013

A small compendium of errors...

For starter, a | b |c doesn't create two file descriptors. It creates four, two per pipe. Then there is a problem with the central argument. For regular files, and in fact all method of creating file descriptor, except pipes, opening in RW mode creates a single descriptor. The reason for pipe() special behaior is two pronged:

1. The raison d'être of pipes is for sequential communication between producer/consumer processes. There is no other reason for their design. As such, it made sense to break the convention of a single FD.

2. At the time of their design, the 70s, memory footprint was a crucial part of any design. Thus, sharing buffers between the producer and consumer FD was primordial, and the best way to make it happen is to create the FD at the same time.

The other problem is that the pipe analogy for TCP is wrong. Which client / server protocol over TCP ever had single direction data transfer? Close to 100% of all TCP usages are between a single client and a single server, communicating both ways.

The design for socket is practical for their intended purpose. Trying to shoe-horn an artifical problem an a design will, unsuprinsingly, no yield proper result.

tptacek · on July 22, 2013

He said it creates two pipes, and later that each pipe creates two descriptors.

He never said there was no reason that sockets behaved differently than other descriptors, only that the reasons didn't warrant the broken abstraction.

And finally, I think you missed the place where he claims the abstraction breaks down. He's not complaining that you use "open" to get a file's descriptor and "pipe" to get a pipe's two descriptors, any more than it's weird to get a socket descriptor for "socket". He's saying that once you have the descriptor, the single file descriptor for sockets forces the OS to implement a socket-specific call simply to get a FIN sent properly.

The reality is that sockets could be represented not with integer descriptors but with character arrays or structs and the Unix interface wouldn't be any less incoherent than it is now with accepting descriptors, a select call that works for sockets and not files, setsockopt, shutdown, recv, connected Unix domain sockets, ioctls, and so on and so forth.

munin · on July 21, 2013

this is something I hate about the "unix design philosophy". when you start digging deeper, you realize that "everything is a file" is true, except for network connections, and A/V devices, and input peripherals, and really that the only things that actually are files, are files. everything else is allllmost a file except in one little weird way, or, many big big ways...

nknighthb · on July 21, 2013

That's not a problem of the philosophy, it's a problem of its implementation. And that is caused by the myriad people and entities to have been responsible for the chaotic evolution of the various parts that make up modern Unix systems.

Design by effectively nobody can be at least as bad as design by committee.

As others pointed out, Plan 9 is much closer to a proper realization of the everything is a file design philosophy. Unsurprising, since it was grown entirely within Bell Labs under the supervision of the original Unix team. It's unfortunate it hasn't reached a point of being ready for mass adoption.

duaneb · on July 21, 2013

> As others pointed out, Plan 9 is much closer to a proper realization of the everything is a file design philosophy. Unsurprising, since it was grown entirely within Bell Labs under the supervision of the original Unix team. It's unfortunate it hasn't reached a point of being ready for mass adoption.

I don't think it's unfortunate, having used it for a month straight I'm not convinced at all it's the right thing, and in fact, is exactly the opposite direction from where I think we should be going (as far away from the filesystem as possible).

nknighthb · on July 22, 2013

Because you have a technical argument, or because of the poor user experience that a research platform provides?

duaneb · on July 23, 2013

My technical argument is that the filesystem is too complicated an abstraction for questionable benefit. Yes, sockets are really cool as files, but that's not any more useful outside of the unix shell.

I suspect that Plan 9 could be a good dev environment, but is not a great deployment platform—it just doesn't offer enough better than developed *NIXes to port code over.

cmccabe · on July 22, 2013

Plan 9 was a great OS. And it did fix a lot of these problems (although I don't know if it fixed this particular one).

But it turns out that what we have is good enough, and it never gained traction.

nknighthb · on July 22, 2013

I don't entirely agree with that characterization. I believe Plan 9 offers fundamental technical advantages that would interest a small but significant and sustainable population of users and developers, and that it's not the inertia of other platforms that it can't overcome, but itself. Its user experience is utter crap, a combination of poor/lack of design and annoying idiosyncrasies by certain of its developers.

It's a research platform, and no real effort has been made to turn it into something more practical. If someone were to pick it up and turn it into something people wouldn't hate using, I think we'd see a community on par with NetBSD or OpenBSD.

haberman · on July 21, 2013

Agree completely; the problem is that a clever-sounding slogan like "everything is a file" is so appealing to people that they want to believe it even if it's not actually true. I've run into this before: https://news.ycombinator.com/item?id=2397039

Here's another similar example of how "everything is a file" falls down that I wrote about in 2005 when complaining about terminals (http://www.advogato.org/person/habes/diary/6.html)

    Then there's the whole mess of pseudo-terminals.
    If you are like me, you might wonder at first why
    pseudo-terminals are necessary. Why not just fork
    a shell and communicate with it over stdin/stdout?
    Well, the bad news there is that UNIX
    special-cases terminals. They're not just
    full-duplex byte streams, they also have to
    support some special system calls for things like
    setting the baud rate. And you might think that a
    change of window size would be delivered to the
    client program as an escape sequence. But no, it's
    delivered as a signal (and incidentally, sent from
    the terminal emulator as an ioctl()).

    Of course, you can't send ioctls over a network,
    so sending a resize from a telnet client to a
    telnet server is done in a totally different way
    (http://www.faqs.org/rfcs/rfc1073.html)

cmccabe · on July 22, 2013

Let's not confuse a bad implementation with a bad philosophy, as a lot of these comments seem to be doing.

Also, "everything is a file" is still true even when some files have additional operations possible on them. It's misleading to say otherwise.

jfb · on July 22, 2013

I'm not even sure that "everything is a file" is particularly appealing as a philosophical position.

munin · on July 22, 2013

so I think that in an abstract world of arbitrary reality, stating "everything is a file" is okay but we have a lot of evidence that implementing something where everything actually is a file is just too hard. If the philosophy is good, why do its adherents produce so much that is crap?

valleyer · on July 22, 2013

Well, for the PTY example above, the reason is backward compatibility. Eventually we need to decide to rid ourselves of the limitations that backward compatibility (with, e.g., bash) brings and clean up the implementation.

rhelmer · on July 21, 2013

Plan 9 has addressed some of the leaks in this abstraction, many things are significantly more "file-like".

Still, it's possible that open/read/write/seek is just not the right abstraction in all cases. http://yarchive.net/comp/linux/everything_is_file.html is interesting reading.

nknighthb · on July 21, 2013

I think it's interesting more in the sense of understanding what Unix is today. It doesn't really read as an argument against a Plan9-like realization of everything-is-a-file.

I'd summarize those posts in two main points:

1) Linux isn't a research project.

2) Bringing a Plan 9-like realization of everything-is-a-file into modern Unix really just creates an even uglier chimera than Unix already is.

tptacek · on July 21, 2013

For a simpler and weirder example, consider the TCP accepting socket. I learned socket programming in the early '90s and accepting sockets were just the way things worked; I never questioned them. A few years ago I had the occasion to teach someone socket programming, and accepting sockets were a giant W-T-F for me as I tried to explain how things worked.

caf · on July 22, 2013

I don't think "everything is a file" was ever supposed to mean "everything works just like a disk file", or even "everything has a name in the filesystem". It really meant "every kernel-managed resource is accessed through a file descriptor". Of course, even this formulation isn't always true (network interfaces and processes are the classic exceptions), but it's a true in a lot more cases. In this sense, "file descriptors" are really just the UNIX name for what Windows calls a "handle".

This is still quite a good paradigm to follow - the semantics for waiting on, duplicating, closing and sending file descriptors to other processes are generally well-defined and well understood. For example, Linux exposes resources like timers, signals and namespaces through file descriptors.

tjgq · on July 21, 2013

Look into Plan 9's take on the file metaphor. I'm not suggesting they got everything right, but the concept really can be made more general than plain old Unix makes it seem.

rca · on July 22, 2013

Well you're right, not everything is a file. We can't treat a socket like a file because the networks is less reliable/fast/whatever than the file system. You can't treat a pipe just like a file because it needs to have someone reading it if you want to write to it, and there are special files accepting weird ioctls because they have weird capabilities that need to be exploited, etc. But still, all these do have things in common, mainly they can be read from and written to, and the 'file' is an abstraction that represent just that. To me this is very much like an object oriented concept. This is polymorphism applied to os resources. If several resources have part of their interfaces in common, then exposing it as a higher level abstraction can help simplify programs that then don't have to worry about type specific details if they don't need to. Anyway even if nothing really is a file, I often find it useful to think about everything as one, as long as I don't forget it's actually not when the abstraction stops being valid. And I'd guess it also helps design and implement the system itself.

lysium · on July 22, 2013

I thought 'everything is a file' mostly referred to the fact that you can read/write to every file descriptor, no matter if the file descriptor points to a socket, a pipe, etc., or an actual file.

Before that, you had completely different syscalls depending on the device you were using.

tome · on July 21, 2013

Yup. For example ioctls. You don't do that to a file.

tankenmate · on July 22, 2013

I just don't see the problem that djb is highlighting. To me the crucial mistake comes in this sentance; "When the generate-data program finishes, the same fd is still open in the consume-data program, so the kernel has no idea that it should send a FIN." generate-data and consume-data should NEVER share a fd; the two pipes in the "same machine model" are two seperate sets to fds (the pipe() returns both ends of the fd in one call). Likewise the TCP model should use two separate (sets of)sockets. shutdown()'s real use is for poorly implemented protocols where the server has no real time way of initiating a control message to the client apart from "abort"ing the connection; some protocols only allow the client to initiate sending a message. Also note that one end of the connection sending a FIN doesn't preclude the other end from sending more data.

80x86 · on July 22, 2013

I'm going to take a stab at interpreting what this article is about.

First, the missing background:

djb likes unix.

the unix philosophy is to compose small programs together to solve problems.

djb's own programs illustrate this really well. They are all small, focused tools. This allows each program to focus on their particular task or domain.

The primary method of composition in unix is the pipe in a shell. Each pipe has two descriptors. One for read and one for write.

It is very easy to create a pipe and handle pipe IO.

The article:

At some point, djb wanted to have some programs live on the network. This expands the composition beyond a single machine. If you just try to treat a socket as a standard pipe, you encounter the problem he describes.

Any program utilizing a pipe requires two file descriptors. If someone built a trivial 'netpipe', they could just 'dup' or 'dup2' the socket file descriptor to make it look like a normal pipe. The problem with that is now the socket won't close until both fds are closed. This means the remote end won't detect EOF. This means the 'netpipe' program has to be very clever in order to detect EOF and do a proper close on both so the remote can see the last bytes and then EOF.

shabble · on July 21, 2013

The original doesn't appear to be dated, but archive.org[1] has the same thing going back to at least 2003.

[1] http://web.archive.org/web/20030805143958/http://cr.yp.to/tc...

Edit: HTTP Last-Modified header looks like it might be right:

    Last-Modified: Tue, 10 Jun 2003 23:44:11 GMT

tptacek · on July 21, 2013

I think this page may predate archive.org, and that it may have lived at a different location on his site. I remember it as part of the ucspi documentation, which (I think) just barely predates tinydns.

jingo · on July 21, 2013

Hey, I got a better idea. Let's give each socket a "human-friendly" name. Translating back and forth to the underlying numerical representation should be easy enough.[1]

Heck, we could even create a centralized system for managing our new namespace. OMG, maybe we could charge people money for the names? Yes! We're rich!

And the result: Hundreds of millions of "parked" domains serving up cheap advertising. Simply brilliant!

1. See Hobbit's comments in netcat source code for a differing opinion.

I often wish that people like djb or Hobbit (=low tolerance for nonsense) had designed the systems that we are now stuck with.

Though they are only applications, netcat and ucpsi have aged well and remain a pleasure to use.

dsl · on July 21, 2013

Nobody is forcing you to use DNS.

jingo · on July 22, 2013

Ha. That is debatable. Define "force".

At the very least, I'd say there is strong coercion. If not in favor of using names, then certainly in favor of deprecating the use of IP and port numbers (e.g., for email). Lemme guess, now you'll say "No one is forcing you to use email."

jingo · on July 22, 2013

   s/ucpsi/djb'"'"'s & applications/

jingo · on July 22, 2013

s/ucpsi/ucspi/g

tomp · on July 21, 2013

Hmmm... use two TCP sockets?

LukeShu · on July 21, 2013

You've missed the point. It's not about "I can't do things with TCP". It's that BSD's networking implementation, which modern Unix networking is based on, broke the "everything is a file" philosophy of Unix. Instead of using the existing file descriptor interface, they created a new socket interface, which is logically 2 file descriptors. However, it's not actually 2 file descriptors, so you now need a bunch of device-specific code.

eps · on July 22, 2013

As others have said, it wasn't a valid point to begin with. A file open for reading and writing is too "logically 2 file descriptors", yet it's a single fd, just like a socket.

80x86 · on July 22, 2013

not to argue, but you can open an fd for reads and writes on unix on a normal file. A socket is not logically 2 file descriptors.

rumcajz · on July 21, 2013

Absolutely true. Too late to fix now though.

angersock · on July 21, 2013

That seems to be the general theme with a lot of network and systems programming. :(

perfunctory · on July 22, 2013

I don't quite understand the appeal of everything is a file philosophy. Neither everything is an object for that matter. It makes the world look like this:

http://www.youtube.com/watch?v=HPeattKV74A

nly · on July 22, 2013

Now let's talk about how half the higher layer protocols in the world that use TCP should probably be using a reliable datagram protocol...

EdiX · on July 22, 2013

It doesn't matter, with the way IP works any reliable datagram protocol would contain an implementation of 90% of TCP.

The more fragmented an IP packet gets along the way the less likely it is it will reach destination, so you have to take into account path MTU size and split your datagrams accordingly. You also want to send as many datagrams as you have available in as few IP packets as possible and you want to do slow start for the same reasons TCP does it.

Result: your datagrams need to become a stream of bytes to be handled efficiently by any transport protocol sitting on top of IP.

binarymax · on July 21, 2013

Someone please correct me if I am wrong, but is this something that Plan9 was solving? Everything (including sockets) being treated as a file?

pfraze · on July 21, 2013

(As I understand it) Plan9's file-system abstracted over network connections, yeah. You'd bind files & folders to your view of the filesystem, and that binding could cross network boundaries.

EDIT: theoh says it better https://news.ycombinator.com/item?id=6080324

80x86 · on July 22, 2013

tl;dr - this is about 'leaky abstractions'

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

this was written about by Joel in 2002.

This article probably dates before that, but is last modified circa 2003 (see other comment on that)

beachstartup · on July 21, 2013

netcat solves this.

sigil · on July 21, 2013

netcat doesn't "solve" this unix design problem any better than the author's own programs, tcpserver and tcpclient [0], solve it.

djb's argument here is that TCP sockets are more like pipes, with separate read and write buffers, and separate read-side and write-side close operations. This makes sense, but what about UDP sockets? What about operations that apply to the socket as a whole, like bind(2), listen(2) or ioctl(2)?

[0] http://cr.yp.to/ucspi-tcp.html

theoh · on July 21, 2013

Plan 9 doesn't separate the read and write fds, but does provide a separate ctl file to replace ioctl. This may be a cleaner approach: control signals are no longer "in band" operations on the data fd (or fds). Logically it seems better to have a single fd representing a full-duplex connection, and expecting the programmer to keep track of the connection state (or possibly receive an error if they don't)

jfb · on July 22, 2013

This makes me long for a djb vs. Linus flamewar. Well, no, not really, because that would be counter-productive, but man, it'd sure be epic.

aidenn0 · on July 22, 2013

Getting completely off topic now, but I would actually pay money to read the spectate that. We just need to find something they disagree about that they also both care enough about to rant at each other for.