Improving Speaker Diarrization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus

Mohd Zulhafiz, Rahim and Sarah Flora, Samson Juan and Fitri Suraya, Mohamad (2023) Improving Speaker Diarrization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus. In: 2023 International Conference on Asian Language Processing (IALP), 18-20 November 2023, Singapore.

[img] PDF
Improving Speaker Diarization.pdf

Download (653kB)
Official URL: https://ieeexplore.ieee.org/document/10337314

Abstract

Speaker diarization plays a vital role in speech transcription involving conversations as it improves the transcribed content’s accuracy, comprehension, and usability. By having a speech transcription diarized, the conversation data has a more structured presentation, allowing for a variety of applications that rely on accurate speaker attribution. Even so, speaker diarization is a field that has been less explored for low-resourced languages, as current resources that have been optimized and applied in speaker diarization are mostly for more developed and well-resourced languages, such as English, Spanish or French. In this paper, we propose an approach to using pseudo-labelled speech data to perform self-training on the x-vector models to improve diarization accuracy. The proposed method uses almost 13 hours Sarawak Malay unlabeled conversational speech corpus obtained from the Kalaka: Language Map of Malaysia website for training, as well as 1 hour and 26 minutes of manually labeled Sarawak Malay speech data for testing and evaluation. We demonstrate how speaker diarization models can be fine-tuned with the pseudo-labeled data.

Item Type: Proceeding (Paper)
Uncontrolled Keywords: Speaker diarization, x-vectors, clustering, low-resource, auto-labeling, pseudo-labeling, unsupervised.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Samson Juan
Date Deposited: 20 Dec 2023 01:43
Last Modified: 20 Dec 2023 01:43
URI: http://ir.unimas.my/id/eprint/43786

Actions (For repository members only: login required)

View Item View Item