Print statements are far more expensive than most junior developers realize. They’re usually very surprised when you profile their code and show them how slow they are.
Look at this program snippet. It runs in about 3 seconds, but what amount of time do you think each part takes?
double x = 0;
for (size_t i = 0; i < ONE_MILLION; i++) {
// (1): several thousand arithmetic, bitwise, and logical operations
for (size_t j = 0; j < 1500; j++) {
x += (i * j) + (i + j) + (i << 3) + (j >> 2) +
(i > j) + (i < j) + (i & j) + (i | j);
}
// (2): a print statement
std::cout << "A relatively short sentence." << std::endl;
}
On your typical machine printing to a terminal, (2) will take about ~1-2x the time of (1).1 And Windows is ~15x slower than that!
In other words, printing a single line takes as long as tens of thousands to millions of arithmetic operations!
This happens in pretty much every language#
You’ll see the same thing in Python. Try running
$ python3 -m timeit "print('A relatively short sentence.')"
...
100000 loops, best of 5: 2.03 usec per loop
Two microseconds! That’s roughly 10,000 cycles on a modern processor.
You can construct a list of about 500 elements and sum it in a similar amount of time.
$ python3 -m timeit "sum([1] * 500)"
100000 loops, best of 5: 2.41 usec per loop
What is print()
doing?#
Think about what print()
actually has to do here. In short,
- Copy the string into to a user-space buffer2 and then flush it (because of the line terminator)
- Perform some standard library processing and run the associated
write()
syscall - Switch over to kernel mode and copy the data to a kernel buffer
- Pass the write request along to the terminal through the virtual file system
- Wait until the terminal driver has transferred the data to its own buffers (though not necessarily render it)
- Return back to user space and finalize some standard library bookkeeping before continuing
That’s an enormous amount of work!
Keep this in mind the next time someone dumps a huge amount of data to stdout or writes a simple “print statement progress bar”3 and then tries to claim that the task truly needs to run that slowly.
The bottleneck very well may be the print statements themselves.
See also and references#
If you’re interested, feel free to glance through
- A couple of relevant StackOverflow threads: “Why is printing to stdout so slow?” and “Why is std::cout so time consuming”
- Basic information on system calls from Wikipedia
- Details of the
write()
syscall from the Linux man pages and Wikipedia - Wikipedia’s overview of kernel mode and the segmentation of user space vs. kernel space
- Information about the Virtual File System (VFS) from the Linux kernel docs. Invoking a filesystem at all might seem a bit weird, but remember, “Everything is a file” on Linux systems.
I’m cheating a bit here by using more of the CPU’s execution units than you’d see in a typical loop and glossing over a few small compiler optimizations. Replacing the loop body with
x += i * j;
does not change the runtime. ↩︎This copy could probably be optimized out in some cases but not universally. ↩︎
E.g., something like this in a critical loop.
std::cout << "\rRunning trial " << trial << " and working on operation " << i << "...";
or
print(f"\rRunning trial {trial} and working on operation {i}...", end='')
These ones will (probably) avoid the excessive flushing, but the impact can still be quite significant. ↩︎