Detecting Audio Transients (Example and Free Cross-Platform Software)
Wikipedia defines an audio transient as "a high amplitude, short-duration sound at the beginning of a waveform that occurs in phenomena such as musical sounds, noises or speech". Here's an example that points out the transients in a short snippet of audio:
So, a transient is just a peak that occurs when there's a specific, "high energy" event - a drum hit, a piano key strike, a guitar string is plucked.
Note that it's somewhat subjective as to what qualifies as a transient. The transients in the above example are obvious, but consider if we had a single smaller peak just barely peaking above the level caused by the synth in the background; a peak only a fraction of the level of the existing drum hits. Would this be a transient? Maybe... Or maybe not...
If you think about a lot of transients occurring in musical instruments - a string plucked, a drum head struck - the transient itself is often 'noisy', followed by a sustained duration of periodic waveform information. For example, zooming in on the beginning of an 808 bass drum waveform, we can see the initial hit of the transient followed with a very sinusoidal waveform:
You might recall a few weeks ago I did a blog post on my phase vocoder. A phase vocoder works much better when transients are preserved and stretching/compressing of phase is done on periodic signals.
When a transient is encountered it's best to output the transient "as-is" and then resume the phase vocoding process on the periodic portion of the audio that follows. This results in a phase vocoder that produces higher quality results.
There's a lot of information out there on transient detection. There are a number research papers and even research papers that sum up the various research papers. I've implemented a straightforward transient detection method described in this paper by Lutshayzar Gueorguieff.
The method is pretty simple: Step through the audio in small windows (maybe 5-10 milliseconds in length) analyzing the energy increase between successive windows. A transient exists when the energy in the current window is greater than the previous window, and one-to-two times greater than the window before that. (This "one-to-two times" level increase over the window before the previous window is what I call the "secondary level threshold").
This method seems to work pretty well in a lot of cases, but can fall short in other cases. The key is usually in choosing the right "secondary level threshold". I made an enhancement to Gueorguieff's method by automatically calculating and scaling the secondary level threshold based on the average energy of previous windows.
This enhancement is not only helpful to the user as they no longer have to guess the optimal secondary level threshold value, but can also provide better results than using a constant secondary level threshold. The following example shows this.
The following waveform is of this audio. It consists of two plucked guitar strings in quick succession, then a bit of a pause followed with a quieter plucked string. The transients are easy to detect by looking at the waveform and listening to the audio.
The table immediately below shows the time (in seconds) and the sample position of each transient along with the secondary level threshold (abbreviated "SL"). This information comes directly from my transient detector command line tool (download information below). Note how the secondary level threshold value varies:
A constant secondary level threshold can't produce the same accurate results. If we use a constant secondary level threshold of 1.220 the algorithm detects false transients at 1.434 and 1.643 seconds:
If we ever so slightly increase the secondary level threshold constant in hopes of not detecting the false transients we no longer detect the second real transient at 0.104 seconds:
Note that these are just zipped executables (not an installer). The command line usage is simple:
TransientDetector -input filename [-sl SecondaryLevelThreshold]
- The secondary level threshold (-sl) is optional. Higher quality results will likely occur when not using this option.
- There currently is only support for mono (single channel) 16 bit uncompressed wave files as input.
- It's currently designed for "short" wave files (as in, under ~30 seconds).
As mentioned in my earlier blog post, my phase vocoder currently does no special handling of transients. It simply expects the beginning of the input audio file to start with a transient. Because of this, when tested on audio that contains multiple transients, the quality suffers quite a bit.
So, next up is utilizing this transient detection in my phase vocoder and performing special processing when a transient is encountered. I should have this implemented soon and will gladly post an update when I do.