A Review on Grapheme-to-Phoneme Modelling Techniques to Transcribe Pronunciation Variants for Under-Resourced Language

Emmaryna, Irie and Sarah Flora, Samson Juan and Suhaila, Saee (2023) A Review on Grapheme-to-Phoneme Modelling Techniques to Transcribe Pronunciation Variants for Under-Resourced Language. Pertanika Journal of Science and Technology, 31 (3). pp. 1291-1311. ISSN 0128-7680

[img] PDF
A Review on Grapheme-to-Phoneme.pdf

Download (227kB)
Official URL: http://www.pertanika.upm.edu.my/

Abstract

A pronunciation dictionary (PD) is one of the components in an Automatic Speech Recognition (ASR) system, a system that is used to convert speech to text. The dictionary consists of word-phoneme pairs that map sound units to phonetic units for modelling and predictions. Research has shown that words can be transcribed to phoneme sequences using grapheme-to-phoneme (G2P) models, which could expedite building PDs. The G2P models can be developed by training seed PD data using statistical approaches requiring large amounts of data. Consequently, building PD for under-resourced languages is a great challenge due to poor grapheme and phoneme systems in these languages. Moreover, some PDs must include pronunciation variants, including regional accents that native speakers practice. For example, recent work on a pronunciation dictionary for an ASR in Iban, an under-resourced language from Malaysia, was built through a bootstrapping G2P method. However, the current Iban pronunciation dictionary has yet to include pronunciation variants that the Ibans practice. Researchers have done recent studies on Iban pronunciation variants, but no computational methods for generating the variants are available yet. Thus, this paper reviews G2P algorithms and processes we would use to develop pronunciation variants automatically. Specifically, we discuss data-driven techniques such as CRF, JSM, and JMM. These methods were used to build PDs for Thai, Arabic, Tunisian, and Swiss-German languages. Moreover, this paper also highlights the importance of pronunciation variants and how they can affect ASR performance.

Item Type: Article
Additional Information: Information, Communication and Creative Technology
Uncontrolled Keywords: Automatic speech recognition, G2P technique, grapheme-to-phoneme, pronunciation variants, under-resourced language
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Samson Juan
Date Deposited: 28 Apr 2023 01:20
Last Modified: 27 Feb 2024 02:32
URI: http://ir.unimas.my/id/eprint/41744

Actions (For repository members only: login required)

View Item View Item