Skip to main content

Print() and std::cout are extremely expensive

· 3 mins
Programming Performance Advice
Ryan Gibson
Author
Ryan Gibson
Quantitative Analyst | Computer Scientist
Table of Contents

Print statements are far more expensive than most junior developers realize. They’re usually very surprised when you profile their code and show them how slow they are.

Look at this program snippet. It runs in about 3 seconds, but what amount of time do you think each part takes?

double x = 0;
for (size_t i = 0; i < ONE_MILLION; i++) {
    // (1): several thousand arithmetic, bitwise, and logical operations
    for (size_t j = 0; j < 1500; j++) {
        x += (i * j) + (i + j) + (i << 3) + (j >> 2) +
             (i > j) + (i < j) + (i & j) + (i | j);
    }
    // (2): a print statement
    std::cout << "A relatively short sentence." << std::endl;
}

On your typical machine printing to a terminal, (2) will take about ~1-2x the time of (1).1 And Windows is ~15x slower than that!

In other words, printing a single line takes as long as tens of thousands to millions of arithmetic operations!

This happens in pretty much every language
#

You’ll see the same thing in Python. Try running

$ python3 -m timeit "print('A relatively short sentence.')"
...
100000 loops, best of 5: 2.03 usec per loop

Two microseconds! That’s roughly 10,000 cycles on a modern processor.

You can construct a list of about 500 elements and sum it in a similar amount of time.

$ python3 -m timeit "sum([1] * 500)"
100000 loops, best of 5: 2.41 usec per loop

What is print() doing?
#

Think about what print() actually has to do here. In short,

  1. Copy the string into to a user-space buffer2 and then flush it (because of the line terminator)
  2. Perform some standard library processing and run the associated write() syscall
  3. Switch over to kernel mode and copy the data to a kernel buffer
  4. Pass the write request along to the terminal through the virtual file system
  5. Wait until the terminal driver has transferred the data to its own buffers (though not necessarily render it)
  6. Return back to user space and finalize some standard library bookkeeping before continuing

That’s an enormous amount of work!

Keep this in mind the next time someone dumps a huge amount of data to stdout or writes a simple “print statement progress bar”3 and then tries to claim that the task truly needs to run that slowly.

The bottleneck very well may be the print statements themselves.

See also and references
#

If you’re interested, feel free to glance through


  1. I’m cheating a bit here by using more of the CPU’s execution units than you’d see in a typical loop and glossing over a few small compiler optimizations. Replacing the loop body with x += i * j; does not change the runtime. ↩︎

  2. This copy could probably be optimized out in some cases but not universally. ↩︎

  3. E.g., something like this in a critical loop.

    std::cout << "\rRunning trial " << trial << " and working on operation " << i << "...";
    

    or

    print(f"\rRunning trial {trial} and working on operation {i}...", end='')
    

    These ones will (probably) avoid the excessive flushing, but the impact can still be quite significant. ↩︎

Related

Why the Hell Is This Git Repo So Large? And How to Trim It Down
· 6 mins
Programming Git Version-Control Repository-Management Advice
A guide on reducing the size of an oversized Git repository, using the Blowfish Hugo theme as a case study. Learn best practices for maintaining a lean repo and its history.
Tips on Creating the Smallest Possible QR Codes
· 3 mins
Qr-Codes Programming Advice
Some brief tips and discussion on how to make tiny QR codes with a focus on web URLs.
Steganography: Hiding Data Inside Data
·· 7 mins
Steganography Programming Github Cybersecurity
A general introduction to hiding information in plain sight, its uses in the real-world, and how it works in digital mediums.