no code implementations • 29 Oct 2020 • Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg
We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Pascal Clark, Rob Haynes, Hywel Richards, John Bridle
Next, we collect a much smaller dataset of examples that are challenging for the baseline system.
no code implementations • 26 Jan 2020 • Siddharth Sigtia, Erik Marchi, Sachin Kajarekar, Devang Naik, John Bridle
We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker.