Deep Learning Machine Solves the Cocktail Party Problem

The cocktail party effect is the ability to focus on a specific human voice while filtering out other voices or background noise. The ease with which humans perform this trick belies the challenge that scientists and engineers have faced in reproducing it synthetically. By and large, humans easily outperform the best automated methods for singling out voices.
Simpson and co trained their deep convolutional neural network to separate the voice’s unique spectrogram from the other spectrograms that are present. They used 50 of these songs to train the network while keeping the remaining 13 to test it on. In total that generated more than 20,000 spectrograms for training purposes.
The outputs turned out to be impressive. “These results demonstrate that a convolutional deep neural network approach is capable of generalizing voice separation, learned in a musical context, to new musical contexts,” say the team.

FOLLOW US