

It's worth pointing out that the recording when played back does sound a little glitchy, with the pacing of the speech being a little off, but bear in mind this is only a prototype version.Īs Sebastian Anthony at Ars Technica points out, Adobe often previews work-in-progress software at its Max event a year or two before it becomes commercialised – and no doubt, as the technology improves, this mimicry of a real voice's speech could get a lot better.īut unlike Photoshop and its many clones, which enjoy broad appeal – since pretty much everybody likes photos – who would need this kind of audio-editing trickery?Īdobe is pitching VoCo at media, podcasters, filmmakers, and audio industry professionals, arguing that the ability to nip and tuck speech recordings will make their working lives easier. To take it further, Jin then edits the audio to make it say "I kissed Jordan three times." But then Jin types in a new word that wasn't part of the audio, inserting a name to give the sentence a wholly different significance: "I kissed Jordan and my dogs." So far, this might not be anything extraordinary, since all those words appeared in the original recording. In the clip, Key says, "I kissed my dogs and my wife." In the program, a visual representation of the sound wave appears in one window, while another window displays the spoken words in text.īy simply copying and pasting in the text window – with no other editing techniques needed at all – Jin first changes the recording to, "I kissed my wife, and my wife,: then manually types "dogs" back in to the end of the sentence: "I kissed my wife, and my dogs." Using a snippet of audio recorded from comedian Keegan-Michael Key, Jin first starts to rearrange the words. In the video below, you can see how VoCo works.

With the right amount of sound data on file – which Adobe says is about 20 minutes of one person talking – VoCo will have actually recorded enough of these phonemes to basically impersonate that person, by stitching them together into new word and sentence formations. While audio-editing apps have long enabled people to manually cut, copy, and splice together parts of sound waves, VoCo (voice conversion) operates on a new principle, using an algorithm that breaks down and recompiles human speech.Īdobe hasn't explained how this technology works just yet, but the software seems to identify and log phonemes – the individual speech sounds we put together to make up words and sentences. Previewing the app at the Adobe Max 2016 software expo last week, researcher Zeyu Jin from Princeton University showed just how easy it will be in the near future to manipulate and transform sound files - and in extreme cases effectively put words that were never actually said into people's mouths.
