What is steganography?#
In short, steganography is the art of concealing information within another, non-secret message, much like the use of invisible ink on a seemingly innocuous letter.1 The idea is that you could pass the message through many untrusted carriers, such as the internet, without arousing suspicion from most observers.
In today’s digital age, you may be surprised as to how much data can be crammed into a file without changing it much at all. For example, below are three tiny 220x220 photos of a flower, but
- One contains the entire, uncompressed text of the United States Constitution.
- Another contains the entire text of Shakespeare’s Macbeth, containing a little under 20k words.
We’ll explore how this is possible in How does (digital) steganography work?, but first, let’s look at some real-world examples.
Real-world examples of steganography#
We’ll start with some physical, non-digital instances.
The EURion constellation#
One of the simplest methods to hide data is to overlay a pattern in the hopes that it can be recovered later.
For example, many banknotes worldwide contain a precise arrangement of circles designed to allow printers and imaging software to combat counterfeiting operations. This has never been officially publicized, but is informally called the “EURion constellation” and has been integrated into at least ~60 countries’ currencies.
If you happen to have a scanner and some cash on hand, you can try copying one of these banknotes. Depending on the model and brand of the scanner, it might refuse to copy or intentionally corrupt the print by adding stripes across the bill! The one that I own tends to forcibly stop the print halfway through.
Printer “Machine Identification Codes”#
In another covert application of steganography, many color printers use tiny yellow dots that are invisible to the naked eye to overlay a tracking watermark. These encode the serial number of the printer and some date and time information across every printed page.
This is also rumored to be one of the reasons why some printers refuse to print black-and-white documents when they are running low on color ink.
The existence of this technology remained unknown to the public for around two decades as it was developed under secret agreements with various national governments to enhance their forensic tracing capabilities. As a result, it’s been used to track down counterfeiters and whistleblowers across the world.
Steganography in video games#
Game developers also use steganography to identify the author of screenshots or gameplay videos, especially when they include cheating, abuse, or unauthorized use of private servers.
In the 2000s, Blizzard implemented very faint watermarks on screenshots of World of Warcraft which contained repeating patterns of dots across the entire screen. These patterns, developed by Digimarc, encoded various details of the user’s account and the server that they were logged into. Like the other examples above, this screenshot tagging remained entirely secret for the first few years of its existence.
Similarly, Microsoft encoded hardware information in the user interface of the Xbox 360’s early builds. Each console’s animations were unique, which allowed the company to crack down on potential leakers. At the time, the employees were under NDAs and would be subject to civil penalties for disclosing nonpublic information about the console’s development.
How does (digital) steganography work?#
In the realm of digital steganography, there are many different techniques, but one of the simplest is “Least Significant Bit” (LSB) steganography.
Basically, the method takes advantage of the fact that most data formats encode information in binary numbers, and the least significant bits of these have the smallest impact on the overall value. By replacing these unimportant bits with a secondary message, we can hide data without making any apparent changes to the file’s original appearance or meaning.
For example, a common image encoding is to store how much red, green, and blue (RGB) is in each pixel with one byte for each color. These values range from 0 to 255 and we can usually change them slightly without most people noticing. Human senses are just far too imprecise to tell the difference, especially when you’re not looking for it!
However, this hidden data can be easily exposed in a “visual attack” where we inspect the LSBs of the image. For instance, if we perform this attack on the three flowers shown at the start of this post, the differences become obvious.
The original image is on the right, and you can faintly see the flower’s outline in its LSBs. In contrast,
- The one on the left appears completely random2 since it contained the contents of Hamlet.
- The one in the middle contained the uncompressed text of the U.S. Constitution and you can visually confirm that the data only takes up the first ~3/4 of the image.
In general, steganographic techniques and their adversarial “steganalysis” counterparts are constantly evolving. More advanced algorithms than this one will minimize changes to the original image’s statistics and would only be detectable with much more sophisticated methods.
On the other hand, this simple technique lets us store a considerable amount of data! This is a direct consequence of the use of binary encoding since the last bit in each byte can only change the color by 1/255 (~0.4%) despite taking up 1/8th (12.5%) of the data itself.
Indeed, in the flower images above we’ve replaced a whole 25% of the actual image data but only altered around 1% of the color information. There is a significant trade-off between the amount of hidden data and the impact on the visual quality of the image.3
More creative steganography techniques#
While we’ve provided a reasonable introduction to the basic ideas, there is an abundance of more interesting methods, so we’ll briefly mention some of them here.
- Text steganography: Messages can be hidden within the formatting, whitespace, or invisible characters of a text itself. Some more intriguing techniques use specific sentence structures or grammatical constructs to impart information. Think of the stereotypical scenario in which you suspect something is amiss when a friend texts you in a particularly unusual writing style.
- Spread Spectrum: These techniques spread hidden data over a wide range of frequencies, but at a lower amplitude, effectively concealing the covert message beneath the natural noise of the transmission medium. Similar variants are also applicable to images and videos.
- Audio steganography: In addition to the usual binary techniques, fine manipulation of echoes, harmonics, or the underlying frequency bands can be used to store information.
- Networking steganography: Many protocols can be manipulated to convey information through calculated usage of (perhaps nonstandard) features, slight manipulation of timing delays between packets, or intentional corruptions that would appear to be typical transmission errors.
- EOF steganography: End of file markers or headers can be manipulated to hide data outside the intended scope of a file. While not strictly steganography in the traditional sense, it has been repeatedly used in malware and hacking operations, so it is worth mentioning.4
If these topics sound interesting to you, I highly recommend searching the internet and exploring any new techniques that come to mind!
See also and references#
- My Python package, stego-lsb, which I used to generate the steganographed images in this post. It also supports sounds files and arbitrary sequences of binary data.
- A forum post containing details of the steganographic methods used in World of Warcraft.
- A Hacker News thread discussing the tracking methods used in Xbox 360 NDA beta builds.
- A Computerphile video on steganographic techniques in images, which includes a discussion of a method for JPEG images that is robust to simple visual attacks.
More generally, consider the following Wikipedia articles.
- Steganography
- EURion constellation
- Machine Identification Code
- Coded anti-piracy, which is a pattern of dots used by the film industry since the 1980s to trace the origins of pirated copies.
- Deniable encryption techniques, where it is generally impossible to prove that any information is encrypted at all.
In fact, steganography comes from Greek word “steganographia”, which literally means something akin to “hidden writing”. ↩︎
Obviously, it’s not actually random since this is just compressed English text. In practice, the hidden data should probably be encrypted in some way that increases the apparent randomness. Otherwise, steganography simply becomes an exercise in security through obscurity. ↩︎
If the transmission channel is noisy, a certain amount of error correction would also need to be included, which will necessarily decrease the amount of data available for use. This includes electrical interference on the wire, image compression, audio being played through physical loudspeakers rather than in a perfect digital medium, etc. ↩︎
This is sometimes referred to as stegomalware. ↩︎