Postdoctorate Viet Anh Trinh led a project within Strand 1 to develop a novel neural network architecture that can both recognize and generate speech. He has since moved on from iSAT to a role at ...
Traditional English corpora mainly collect information from a single modality, but lack information from multimodal information, resulting in low quality of corpus information and certain problems ...
Speech emotion recognition (SER) technology involves feature extraction and prediction models. However, recognition efficiency tends to decrease because of gender differences and the large number of ...
Children’s speech presents unique challenges for ASR systems. Their smaller, growing vocal tracts lead to greater acoustic variability. On top of that, kids are still learning how to speak, making ...