Jump to content

WAV audio files being used to hide malware


Jim Feeley

Recommended Posts

Interesting... 

 

WAV audio files are now being used to hide malicious code

Steganography malware trend moving from PNG and JPG to WAV files.

 

[a couple excerpts]

The technique is known as steganography -- the art of hiding information in plain sight, in another data medium.

The first of these two new malware campaigns abusing WAV files was reported back in June. Symantec security researchers said they spotted a Russian cyber-espionage group known as Waterbug (or Turla) using WAV files to hide and transfer malicious code from their server to already-infected victims.

The second malware campaign was spotted this month by BlackBerry Cylance. In a report published today and shared with ZDNet last week, Cylance said it saw something similar to what Symantec saw a few months before.

But while the Symantec report described a nation-state cyber-espionage operation, Cylance said they saw the WAV steganography technique being abused in a run-of-the-mill crypto-mining malware operation.

Cylance said this particular threat actor was hiding DLLs inside WAV audio files. Malware already-present on the infected host would download and read the WAV file, extract the DLL bit by bit, and then run it, installing a cryptocurrency miner application named XMRrig.

 

The whole (not long) story:

https://www.zdnet.com/article/wav-audio-files-are-now-being-used-to-hide-malicious-code/

 

 

Link to comment
Share on other sites

Is it steganography or just the reporter's shortcut to an idea? The .wav format allows all sorts of nonstandard chunks in the header, for things like broadcast commercial codes. It would be a lot easier to hide instructions to a compromised machine there, than trying to bury it in the audio.

 

(FWIW, I've done a lot of work with People Meter radio listenership tracking, which is actual steganography: station ID code and time stamp are hidden under program audio, and picked up acoustically from any nearby radio by a tiny gadget the sample listeners wear. It's a very dicey system, because it depends on masking energy that's in some pop music formats but not in most jazz, classical, or talk.)

 

 

Link to comment
Share on other sites

One of the key links in the article goes to this:

 

Malicious Payloads - Hiding Beneath the WAV

 

BlackBerry Cylance Threat Researchers recently discovered obfuscated malicious code embedded within WAV audio files. Each WAV file was coupled with a loader component for decoding and executing malicious content secretly woven throughout the file’s audio data. When played, some of the WAV files produced music that had no discernible quality issues or glitches. Others simply generated static (white noise).

Our analysis reveals some of the WAV files contain code associated with the XMRig Monero CPU miner. Others included Metasploit code used to establish a reverse shell. Both payloads were discovered in the same environment, suggesting a two-pronged campaign to deploy malware for financial gain and establish remote access within the victim network.

The WAV file loaders can be grouped into the following three categories, which we will discuss in detail:

  1. Loaders that employ Least Significant Bit (LSB) steganography to decode and execute a PE file.
  2. Loaders that employ a rand()-based decoding algorithm to decode and execute a PE file.
  3. Loaders that employ rand()-based decoding algorithm to decode and execute shellcode.

Rest of the 

https://threatvector.cylance.com/en_us/home/malicious-payloads-hiding-beneath-the-wav.html

 

Link to comment
Share on other sites

 

@Jay Rose It seems to be steganography.

 

In this case it's not a WAV file exploiting vunlerabilities to run malicious code, but a mechanism to distribute new code to already compromised hosts. Why a WAV file? Because firewalls and other malware detection systems won't intercept them.

 

So, no need to get paranoid about WAV files. The risk would be the same if it was kitten pictures.

 

Anyway the potential risks in all this are mostly a Windows thing. I remember the latest twist I have seen recently. I am receiving email messages with attached malicious files in .tar.gz format (a Unix file format). Turns out that modern Windows systems can open them. But if I upload the malicious .tar.gz file to Google's Virustotal.com (where a farm of anti virus systems check it), most of the antivirus programs complain of an unsupported file format!

 

Windows has always had a critical problem with consistency when dealing with the meaning of "opening" a file, which is not such a simple thing because a file can be data or executable code. And they have traditionally made a huge mess with it. Apple haven't been free of foolish decisions but they have done vastly better. 

 

Link to comment
Share on other sites

Jim and Borjam,

 

Yes, thanks for Jim's second quote. It does appear to be steganography in the audio stream itself. Which means it would also live in AIFF translations.

 

Next question: since it's using tiny changes in the audio, would the payload disappear if the stream was then run through common psychoacoustic data compressors?

Link to comment
Share on other sites

44 minutes ago, Jay Rose said:

Jim and Borjam,

 

Yes, thanks for Jim's second quote. It does appear to be steganography in the audio stream itself. Which means it would also live in AIFF translations.

 

Next question: since it's using tiny changes in the audio, would the payload disappear if the stream was then run through common psychoacoustic data compressors?

 

That’s a difficult question, it would depend a lot on the details. There are data encoding methods that can withstand lots of abuse, but mostly “analog” abuse (ie, noise, multi path interference, etc). Of course at the cost of bandwidth, Information Theory is after all one of the hard limits in nature, like Thermodynamics.


That said, if I was the designer of that thing I wouldn’t try to make it error tolerant. After all the rest of the protocol stack will take care of that and email   or web page content are transmitted over lossless paths. I would include some error detection mechanism but that’s it. They are using the audio file just because it will be considered harmless. And using steganography they can avoid some detection mechanisms that can identify properties of computer code. But that’s it.

 

I remember (old story) when some phone companies in Spain begun using audio compression in their trunk circuits. Suddenly 28800 bps modems were unable to link at a data rate above 9600. Of course those modems had the ability to negotiate a data encoding scheme. One of these steganography files would not make it.

 

Moreover, error tolerance depends basically on redundancy. Redundancy undermines the security of encryption, they are two opposite goals. Any property invariant through several transformations will at the same time make cryptoanalysis much much easier.

 

Link to comment
Share on other sites

I can say that in the audio steganography I actually know something about -- Nielsen's Portable People Meter for station tracking -- data compression does serious damage to the hidden signal.

 

PPM relies on details below the threshold of hearing, which changes in each narrow band depending on what the loudest content is. Most brains will ignore softer content in the same narrow band, because there are only so many neural pathways from available. Consider just the fundamental of a flute and a trumpet, in the middle of a sustained C... you can hear that there are two instruments, but only because the harmonic patterns are different. 

 

These bands aren't limited to a single frequency. Many mp3 algorithms break the spectrum into 384 bands, chosen based on lots of tests of listeners. So a loud "A440" will overrule the short term mod effects of a softer A441 playing at the same time. And it'll completely blast out any very soft random noise (or harmonics from bass instruments) that might be around. Fewer bits are needed to encode that 440-and-neighbors band than if there were full resolution. (mp3 then applies a zip-like compression and AAC adds some other tricks, but you get the idea.)

 

Nielsen's audio steganography relies on the same trick. If a station's program has energy in a specific mid-range band (overlapping filters with a Q of ~15, IIRC) for long enough, then a single frequency is generated at the center frequency and mixed in ~ 6 dB softer (how soft becomes a station's choice... too loud is perceived as a harsh distortion, which can turn off listeners). If a burst lasts 480ms, IIRC, it becomes a data bit. 

 

So both systems are relying on the same phenomenon: listeners generally can't hear softer signals in the same narrow and momentary band. Nielsen puts the station ID code down there. mp3 kills anything down there, to save data.

 

Seems like they'd fight.

 

--

We found that to be true in actual tests, as well...

 

 

Link to comment
Share on other sites

7 hours ago, Jay Rose said:

I can say that in the audio steganography I actually know something about -- Nielsen's Portable People Meter for station tracking -- data compression does serious damage to the hidden signal.

 

 

Quote

 

PPM relies on details below the threshold of hearing, which changes in each narrow band depending on what the loudest content is. Most brains will ignore softer content in the same narrow band, because there are only so many neural pathways from available. Consider just the fundamental of a flute and a trumpet, in the middle of a sustained C... you can hear that there are two instruments, but only because the harmonic patterns are different. 

 

These bands aren't limited to a single frequency. Many mp3 algorithms break the spectrum into 384 bands, chosen based on lots of tests of listeners. So a loud "A440" will overrule the short term mod effects of a softer A441 playing at the same time. And it'll completely blast out any very soft random noise (or harmonics from bass instruments) that might be around. Fewer bits are needed to encode that 440-and-neighbors band than if there were full resolution. (mp3 then applies a zip-like compression and AAC adds some other tricks, but you get the idea.

 

Indeed, and low frequencies tend to mask higher ones. 

 

Quote

 

So both systems are relying on the same phenomenon: listeners generally can't hear softer signals in the same narrow and momentary band. Nielsen puts the station ID code down there. mp3 kills anything down there, to save data.

 

Seems like they'd fight.

 

I am sure of that. Moreover, "traditional" techniques to enhance bandwidth over very noisy channels deal with "natural" causes of signal degradation. Examples would be communications with space probes orbiting near the Sun or, for example, the work of Joe Taylor on low signal data transmission modes with bandwidths in the range of bits per second. Not even hundreds! 

 

Digital lossy compression on the other hand can butcher signal integrity in very curious ways. 

 

The modem example I mentioned is interesting because probably the designers of the compression system decided to allow 9600 bps modems to work. In the PPM vs MP3/AAC case, however, it's just completely opposite goals. 

 

Explaining it in a somewhat extreme way, lossy audio compression systems interpret the bit strem according to a psychoacoustic model and resynthesize them. I wonder wether someone has came up with a scheme robust enough to survive that. It sounds challenging and (intuitivelly) I am not really sure it would even be possible at all. Maybe playing with timing tweaks, but we are good at detecting that.

 

Now, let's be careful. Imagine some crazy politician reading us and thinking about mandating a universal re-encoding of audio and video content over the Internet in order to avoid steganography :D

 

Link to comment
Share on other sites

12 hours ago, borjam said:

...Imagine some crazy politician reading us and thinking about mandating a universal re-encoding of audio and video content over the Internet in order to avoid steganography...

 

Capitalism is 'way ahead of politics. Many modern cellphones use vocoding, analyzing speech into critical component bands, sending just the data for each band, and re-synthesizing it at the other end.  There are fewer bands than in mp3, chosen to reflect the acoustic filters in the human vocal tract, and also data is sent about fundamental pitch and high-end noise (such as sibilance). The technique is almost a century old and was developed to compress phone calls over analog lines, but not practical with the equipment then... so it remained a lab and creative tool. Modern DSP gave us the technology to build it into practical phones.

Link to comment
Share on other sites

On 10/22/2019 at 7:54 PM, borjam said:

So, no need to get paranoid about WAV files. The risk would be the same if it was kitten pictures.

 


Exactly! Nothing to be worried about here. 

The key part to read in the original article linked to in the first post is right here at the very end of it (especially the last two sentences):

 

Quote

"Stego can be used with any file format as long as the attacker adheres to the structure and constraints of the format so that any modifications performed on the targeted file do not break its integrity," Lemos told.

In other words, defending against steganography by blocking vulnerable file formats is not the correct solution, as companies would end up blocking the downloading of many popular formats, like JPEG, PNG, BMP, WAV, GIF, WebP, TIFF, and loads more; wreaking havoc in internal networks and making it impossible to navigate the modern internet.

A proper way of dealing with steganography is... not dealing with it at all. Since stego is only used as a data transfer method, companies should be focusing on detecting the point of entry/infection of the malware that abuses stegonagraphy, or the execution of the unauthorized code spawned by the stego-laced files.

 

Link to comment
Share on other sites

You all are looking at this all wrong.

 

1) It's not that we need to be scared of WAV files. But this is a pretty cool development.

 

2) This is a great opportunity for us. When you ask a producer, "Where's my payment?" and they say, "Oh; I can't find your invoice," you say, "It's right there in the WAV file." This is a win win.

Link to comment
Share on other sites

  • 2 weeks later...

FWIW, BWF is Wave. (But not all Wave is BWF.)
Wav is a Riff format. It contains chunks of information. Header, Audio data at least.
When a BEXT chunk is added, it becomes a BWF.  (Early versions of Zoom H4 did this, but completely meaningless since there was no metadata inside, just for marketing to be able to say 'hey, it's BWF!')

But you can also put in a INVO (per Jim, Invoice) chunk and put a PDF in there, and it would still be a completely valid Wave file that will sound exactly the same.
(Cover art is sometimes included...)

The fun part in this story is that the least significant bits are used. (Read, the lowest volume. An infected 24 bits Wave with silence would still be silent unless you crank up the volume to absurd levels.)
So, very hard to detect. (But a bit strange choice as traffic of Wave is way less than Mp4 / Pdf / Mp3 or alike...)

Link to comment
Share on other sites

11 hours ago, Bouke said:

...a bit strange choice as traffic of Wave is way less than Mp4 / Pdf / Mp3 or alike...

 

 

See my 10/22 post. In all probability, MPEG and other psychoacoustic formats break the hidden malware. That's what we found in a multi-year well financed project with a different audio steganographic format, where I was one of the devs and also wrote the docs.

 

 

Link to comment
Share on other sites

On 11/6/2019 at 11:34 PM, Jay Rose said:

FWIW, the audio payload in a BWF is identical to a .wav

 

Of course, just a joke. I was just remembering many years ago when I came with files from the 744T and an editor said: "What is that? I can handle WAV, but I don't know this BWF" haha

Link to comment
Share on other sites

I missed the 😄.

 

Of course, there are also lots of folks who read this forum and might not have a good technical background. (You don't need a computer background to be a good production mixer...  or at least, you didn't when I started... and that's not quite a joke.)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...