Enhanced Model Compression for Lipreading Recognition based on Knowledge Distillation Algorithm

Qianru, Lu and Kuryati, Kipli and Tengku Mohd Afendi, Zulcaffle and Yuan, Liu and Xiangju, Liu and Bo, Wang (2025) Enhanced Model Compression for Lipreading Recognition based on Knowledge Distillation Algorithm. Journal of Advanced Research Desig, 145 (1). pp. 208-221. ISSN 2289-7984

[img] PDF
6358-Article Text-33864-1-10-20251028.pdf

Download (2MB)
Official URL: https://www.akademiabaru.com/submit/index.php/ard/...

Abstract

Lipreading is understanding what a speaker is saying by observing changes in the speaker's mouth. The lipreading recognition model LipPC-Net proposed in this paper is built with a large Chinese lipreading data set based on Chinese phonetic rules and grammatical features and consists of two main parts: the P2P sub-model and the P2C sub-model. The P2P sub-model is a model for identifying pinyin sequences from pictures, while the P2C sub-model is a model for identifying Chinese character sequences from pinyin. However, Chinese language features are rich and fuzzy, and the training optimization of lip-reading model requires high GPU computation and storage, so it is difficult to realize large-scale application. Therefore, three knowledge distillation compression algorithms are proposed in this paper: Three different knowledge distillation compression algorithms, an offline model compression algorithm based on multi-feature transfer (MTOF), an online model compression algorithm based on adversarial learning (ALON), and an online model compression algorithm based on consistent regularization(CRON) to complete the compression of the Chinese character sequence output by the model. Three compression algorithms are used to fit and learn the transformation between different features, so that portable mobile terminals with limited hardware resources can carry the model. Thus, it can realize the practical application value of assisting the communication of deaf-mutes.

Item Type: Article
Uncontrolled Keywords: Lipreading, Deep learning, Model compression, Knowledge distillation.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Academic Faculties, Institutes and Centres > Faculty of Engineering
Faculties, Institutes, Centres > Faculty of Engineering
Depositing User: Gani
Date Deposited: 08 Dec 2025 00:17
Last Modified: 08 Dec 2025 00:17
URI: http://ir.unimas.my/id/eprint/50751

Actions (For repository members only: login required)

View Item View Item