Title
Automatic speech recognition in Spain. The Basque and Catalan case
Conference name
8th International symposium on live subtitling and accessibility
City
Country
Spain
Modalities
Date
19/04/2023
Abstract
This contribution aims at analyzing the speech to text recognition of news programs in Basque and Catalan. It presents results of QuaLiSub (The Quality of Live Subtitling: A regional, national and international study, led by Universidade de Vigo), in which automatic speech recognition is analyzed applying criteria from the NER model (Romero-Fresco and Martínez, 2015).
For Basque, 20 samples of approximately 5 minutes of news programs from the autonomic channel ETB1 were recorded in May 2022. Since automatic live subtitling is not a reality in Basque TV, the Elhuyar Foundation collaborated by generating subtitles through speech recognition of 19 samples (1 sample was not recognized by the program) using their technology ADITU. A total of 97 minutes and 1737 subtitles were analyzed.
In Catalan, the analysis was done on 26 samples of approximately 5 minutes of news programs from the bilingual regional news bulletin in Spanish national television (La 1). These bilingual subtitles (in Spanish and Catalan) were broadcast from April to July 2021 and recorded by TVE for quality assurance. In this contribution, results on the accuracy rate of the Catalan language in 2116 subtitles (a total of 130 minutes) will be presented.
The results in both languages show an average accuracy rate below the minimum threshold of 98% set by the NER model. A qualitative analysis based on quantitative data foresees some room for improvement regarding language models of the software including proper nouns, punctuation, recognition of numbers and percentages and character identification. Conclusions show that, although quantitative data does not reach the threshold to consider the quality of recognition fair or comprehensible with regards to the NER model, results seem promising. When presenters speak with clear diction and standard language, accuracy rates are fair enough for these two minority languages like Basque and Catalan in which speech recognition software are still in early phases of development.
For Basque, 20 samples of approximately 5 minutes of news programs from the autonomic channel ETB1 were recorded in May 2022. Since automatic live subtitling is not a reality in Basque TV, the Elhuyar Foundation collaborated by generating subtitles through speech recognition of 19 samples (1 sample was not recognized by the program) using their technology ADITU. A total of 97 minutes and 1737 subtitles were analyzed.
In Catalan, the analysis was done on 26 samples of approximately 5 minutes of news programs from the bilingual regional news bulletin in Spanish national television (La 1). These bilingual subtitles (in Spanish and Catalan) were broadcast from April to July 2021 and recorded by TVE for quality assurance. In this contribution, results on the accuracy rate of the Catalan language in 2116 subtitles (a total of 130 minutes) will be presented.
The results in both languages show an average accuracy rate below the minimum threshold of 98% set by the NER model. A qualitative analysis based on quantitative data foresees some room for improvement regarding language models of the software including proper nouns, punctuation, recognition of numbers and percentages and character identification. Conclusions show that, although quantitative data does not reach the threshold to consider the quality of recognition fair or comprehensible with regards to the NER model, results seem promising. When presenters speak with clear diction and standard language, accuracy rates are fair enough for these two minority languages like Basque and Catalan in which speech recognition software are still in early phases of development.