For what it's worth, I believe Cosmopolitan Libc's --ftrace overhead averages out to 280ns per function call. That's the number I arrived at by building in MODE=opt, adding a counter to ftracer, running Python hello world with the trace piped to /dev/null, and then I divided the amount of time the process took to run by the number of times ftracer() was called. Part of what makes it fast is that it doesn't have to issue any system calls (aside from write() in the case where it needs to print). As for the overhead when ftracing isn't enabled, I believe there is zero overhead. The NOP instruction in the function prologue is nearly free. I recall reading reports where the instruction timings for these fat nops is like ~200 picoseconds.
Most of the overhead comes from the fact that it's using kprintf() to print the tracing info, since I'm happy to spend a few extra nanoseconds having more elegant code. So it could totally be improved further. Another thing is that right now it's only line buffered. So if it buffered between lines, it'd go faster.
Most of the overhead comes from the fact that it's using kprintf() to print the tracing info, since I'm happy to spend a few extra nanoseconds having more elegant code. So it could totally be improved further. Another thing is that right now it's only line buffered. So if it buffered between lines, it'd go faster.