Auto-Tune and the Science That Ruined Music

https://www.youtube.com/watch?v=4p0chD8U8fA

"I couldn’t believe it… I didn’t think anyone in their right mind would use it that way."

Auto-Tune is not just the inhuman warbling debuted by Cher in 1998. This staple of today’s pop music is much more subtle than you may realize. "The secret popped out of the bag when Cher did 'Believe'. I couldn’t believe it… I didn’t think anyone in their right mind would use it that way." The secret that Auto-Tune inventor Andy Hildebrand is referring to, one still tightly guarded by the recording industry, is that Auto-Tune is generally subterfuge. It’s a clever trick of signal processing that can fix the pitch of any less-than-perfect notes your favorite pop star might utter. It’s like plastic surgery for voices.

An anonymous Grammy-winning recording engineer was quoted in a 2009 Time magazine article saying:

“You haul out Auto-Tune to make one thing better, but then it’s very hard to resist the temptation to spruce up the whole vocal, give everything a little nip-tuck.”

[Reference: Autotune: Why Pop Music Sounds Perfect]

The "surgery", as it were, was developed by an electrical engineer who earned his chops working for Exxon as a research scientist, seeking better ways to find oil deposits. The best tool for that job is seismology, the study of how acoustic waves move through the ground. When a wave encounters a new material, its direction gets nudged (refracted) or the wave is bounced back (reflected). These effects depend on the substance, so examining how a wave’s direction changes can tell you about what it moved through. This is how we came to understand Earth’s interior structure.

"Thumper Trucks" generate seismic waves that reflect off rock structures back to a network of geophones. Source: Shell Oil Company via YouTube

Academics tend to use seismic waves from earthquakes, but geologists hunting oil more commonly create their own mini-quakes to probe for fuel deposits. They do this using “thumper trucks” that slam heavy metal plates onto the ground, or with explosives strategically drilled beneath the surface. Special microphones called “geophones” (or “hydrophones”, for aquatic studies) record the resulting sound waves as they reflect off the various rock structures. The basic principle of this “reflection seismology” is that waves are reflected back to the geophones at different rates depending on what they encountered, allowing scientists to determine what’s underground.

But in practice, interpreting geophone data isn’t so easy. The signals are noisy and subsurface structures are complicated. A 50% success rate for accurately predicting the presence of a fuel deposit is great for the industry. This is where Dr. Auto-Tune comes in. Hildebrand helped develop new ways to process geophone signals that yield better fuel-finding predictions. This work led directly to the invention of Auto-Tune and involved a technique called autocorrelation.

Cross-correlation is a measure of the similarity between two signals as a function of an offset between them. Image credit: Patrick McCauley, for From Quarks to Quasars

Autocorrelation can easily find "hidden" patterns in noisy data. Credit: Jeremy Manning via Wikimedia Commons/CC-BY-SA 2.5

We’re about to poke our heads into a mathematical rabbit hole, but we won’t go far, so stick with me. The autocorrelation function is part of large bag of signal processing tricks that are vital to many scientific pursuits. You’re likely to encounter something like it anywhere that electrical or statistical signals are being used to turn measurements into meaningful observations. Astrophysics, medical imaging, neuroscience, geophysics, data mining, quantitative finance, cryptography, and a host of other fields all employ similar techniques.

Autocorrelation is the cross-correlation of a function with itself. All you really need to know is that cross-correlation is used to measure the similarity between two signals given some offset between them, usually a time lag. Think of this as sliding one curve overtop of another to determine how well they line up. This is visualized with a simple example in the animation above. If you do this with the same two signals, as with autocorrelation, it’s a measure of how similar one data point is to the next. This is useful because it can detect patterns that otherwise might be hard to find because of noise in your measurements. The upper plot on the right shows noisy data with a sine wave buried in it. Using autocorrelation, one can easily find the hidden wave pattern by constructing the “correlogram” in the lower plot.

This clever trick now helps oil companies find wave patterns that indicate the presence of fuel deposits in their geophone signals. In 1989, Antares Audio Technology was founded because Hildebrand realized that the same techniques could be applied to music. Imagine your last bathroom-singing masterpiece as a noisy data stream. Buried beneath each shaky note is the characteristic signal of the pitch you tried (maybe successfully) to hit. Using autocorrelation, Hildebrand developed software that automatically finds that signal and steers the sound to the nearest half step on the 12-tone musical scale. In other words, it automatically tunes your voice. And thus, Auto-Tune was born.

Of course, Auto-Tune is not the first attempt at electronic music manipulation. The “talk box” is a relatively simple predecessor that uses plastic tubing running from the singer’s mouth to a speaker. They’re most commonly used with keyboards and electric guitars to create an instrument-voice hybrid sound. The “vocoder”, which was introduced in 1940 by AT&T Bell Labs, is even more similar to Auto-Tune. Because it reduces voices down to short sections of a single frequency, saving bandwidth, the vocoder was originally intended for telecommunications. But it wasn’t too long before musicians began adopting the robotic sound for their own acts. Here’s my favorite musician, Feist, employing a vocoder in her live performance of the traditional folk song, Sea Lion Woman (see if you can also spot the talk box being used by one of the band members).

https://www.youtube.com/watch?v=1l7xfuu8ja8

Auto-Tune has another important advantage over the vocoder used in the video above in that it independently modifies frequency (pitch) and timing. This makes it a “phase vocoder” and means that the pitch can be increased, for instance, without speeding the signal up to sound like Alvin and Chipmunks. Another trick of signal processing, the short-time Fourier transform, is behind this capability, but I think we’ve had enough of that for one piece.

[one_fourth last="no"]Pitch correction soon spread to all parts of the track and exploded to produce the current generation of pitch-perfect pop vocalists, whose unwavering notes might otherwise indicate some evolution of the larynx since the mid-nineties.[/one_fourth]

Perhaps the most significant difference between Auto-Tune and its predecessors is that Auto-Tune is generally meant to go unnoticed. It was designed to let performers lay down their vocals with less repetition. An audio engineer can now easily touch up challenging sections that might take the singer hours to get right. Pitch correction soon spread to all parts of the track and exploded to produce the current generation of pitch-perfect pop vocalists, whose unwavering notes might otherwise indicate some evolution of the larynx since the mid-nineties. Auto-Tune’s automatic pitch detection capability even allows it to be used in live performances with negligible time delay. Most artists won’t admit to such vocal doping, but stars like Faith Hill and Tim McGraw liken it to a safety net that guarantees good performances. It really is quite an amazing technological feat, however disingenuous.

https://www.youtube.com/watch?v=3EWruiIjBmo

Perhaps we’ll look back on the last 15 years of pitch perfection like the steroid years of cycling and clamor for Grammy awards to be revoked. There certainly seems to be growing discontent regarding the overuse of Auto-Tune, but this is geared mainly toward an application that was never intended by its inventor. When Jay-Z declared the Death of Auto-Tune in 2009, he was criticizing the now gimmicky use pioneered by Cher and later advanced by T-Pain in hip-hop. Rather than gradually bending his voice to achieve perfect pitch, T-Pain introduces sharp jumps and discontinuities in frequency to create obviously inhuman effects. I agree with Mr. Z that the novelty of this technique has worn off, but at least it added something new to music aside from illusory vocal flawlessness.

The ubiquity of pitch correction parallels that of Photoshop in advertising. A facsimile of the human voice is presented after any perceived imperfections have been smoothed out. It's fair to say that this limits the authenticity of pop music, but perhaps "the science that ruined music" is overly damning. After all, pop stardom has always been predicated on more than musical talent, and Auto-Tune just lowers that bar a bit further. Pitch correction is almost certainly here to stay, but outside the limelight, and occasionally within, there will always be musicians who buck the trend.

Whatever scorn I might harbor is entirely outweighed by my fondness for Auto-Tune's single most redeeming application. The Symphony of Science and Melodysheep videos produced by John Boswell use Auto-Tune to turn science clips into wonderfully engaging music videos. It started with "A Glorious Dawn", which used clips from Carl Sagan's Cosmos, and many more have followed featuring a who's who of science promoters. Check out his recent tribute to conservationist Steve Irwin below.

Share This Article