Title
Automatic live subtitling in Spain. The quality of bilingual TV newscasts in Galicia
Conference name
8th International symposium on live subtitling and accessibility
City
Country
Spain
Modalities
Date
19/04/2023
Abstract
Technology has been a catalyst in the development of both AVT and MA. Specifically, continuous improvements in technological tools have enabled and encouraged progress in the field of live subtitling. As a matter of fact, the rising capabilities of SRS have propelled respeaking, which has become one of the most popular methods to produce real-time subtitling in recent decades and it is nowadays used in a wide range of settings, from TV shows to live events. However, advances in AI have favored the presence of automation in an ever-increasing number of sectors. Automation has revolutionized all those fields in which it has been introduced, and it has had an impact on virtually every aspect of modern life, including the subject at hand and the focus of this conference: live subtitling.
The massive potential of AI has made it feasible to automate the real-time subtitling process, replacing human intervention with emerging technologies. Despite the fact that no automatic live subtitling software has proven to generate an immaculate outcome to date, the quality of some automatic subtitles is rather remarkable. In addition, the speed and affordability of such technologies have not gone unnoticed either. Consequently, fully automatic subtitles have been gaining ground in some countries for television broadcasts, such as in Spain.
This presentation is framed within QuaLiSub, a project funded by the Spanish Ministry for Science and Innovation, which intends to analyze live subtitling quality in the US (English) and Spain (Spanish and some co-official languages: Galician, Catalan, and Basque). In this context, a comprehensive quality assessment of machine-generated subtitling was conducted as requested by RTVE, the largest Spanish state-owned public media corporation.
Over the course of 16 weeks, nearly 300 minutes of audiovisual material were analyzed, with a focus on subtitling accuracy and two other quality indicators, namely subtitling speed and subtitling delay. All samples belong to one of the 17 territorial newscasts in Spain, specifically the Galician one, in which Galician and Spanish coexist in the original dialogues. The main intention was to examine the software’s performance and enhance its functioning based on the data collected. Therefore, several reports on the quality of the samples were elaborated periodically so that developers could gradually improve the SRS capabilities.
By providing the findings of the quality analysis conducted on the bilingual automatic subtitles of fifty-five 5-minute samples supplied by TVE, this presentation intends to 1) offer a complete understanding of the performance and evolution of the SRS under consideration, 2) show the importance of selecting the most appropriate quality assessment tool depending on the end goal, and 3) shed some light on how quality analysis can contribute to the improvement and implementation of SRS for automatic live subtitling on TV and, ultimately, the enhancement of media accessibility.
In order to analyze subtitling accuracy, the WER Model was applied to all samples, as requested by the broadcaster, resulting in an average accuracy rate of around 70% and an average WER rate below 20%. Although some improvement was detected regarding the SRS performance, this was neither consistent nor regular as the error rate did not decrease gradually over the weeks. Additionally, the NER Model was applied to twenty of the samples, mainly on account of its viewer-centered nature. The findings revealed that no sample reached the 98% NER quality threshold, implying that these automatic subtitles would not meet users’ needs and, therefore, would not fully accomplish their intended purpose. Regarding subtitling speed and delay, results were calculated and evaluated in accordance with the UNE-153010-2012 Standard. It was observed that speed was slightly higher than the maximum established of 15 cps (16.5 cps on average) whereas delay remained below the limit of 8 seconds (6.5 seconds on average).
In addition to the figures, qualitative results are also provided in this presentation. Specifically, information about the most prevalent errors identified throughout the analysis is offered, including those related to numbers, COVID argot, and proper names, among others. The weekly and monthly reports are presented, detailing the type of errors included therein, and brief remarks are finally shared with the audience addressing the strengths and weaknesses of the software, which was heavily influenced by the features of the original audiovisual material. In fact, the SRS has shown quality results under highly controlled conditions, such as the absence of background noise and the participation of speakers who communicate fluently and accurately with clear diction and pronunciation. On the contrary, the detection of language changes (from Galician to Spanish, and vice versa), sometimes influenced by how some locals speak, has proven to be problematic for the recognizer.
Although the coexistence of Galician and Spanish in the original dialogues complicates the already challenging process of automatic subtitling, it is not such a decisive factor as to prevent the use of such technologies in the territorial newscast. In fact, the possibility of using an SRS that can handle both languages during a TV broadcast is quite meaningful, particularly given that this is the first attempt at developing bilingual software involving Galician, a minority and minoritized language. Despite far from being ideal, the results of this study reveal the exponential growth and improvement of automatic subtitling and provide grounds for further research on this field, which is common practice in our everyday lives and will continue to be so on television as well.
The massive potential of AI has made it feasible to automate the real-time subtitling process, replacing human intervention with emerging technologies. Despite the fact that no automatic live subtitling software has proven to generate an immaculate outcome to date, the quality of some automatic subtitles is rather remarkable. In addition, the speed and affordability of such technologies have not gone unnoticed either. Consequently, fully automatic subtitles have been gaining ground in some countries for television broadcasts, such as in Spain.
This presentation is framed within QuaLiSub, a project funded by the Spanish Ministry for Science and Innovation, which intends to analyze live subtitling quality in the US (English) and Spain (Spanish and some co-official languages: Galician, Catalan, and Basque). In this context, a comprehensive quality assessment of machine-generated subtitling was conducted as requested by RTVE, the largest Spanish state-owned public media corporation.
Over the course of 16 weeks, nearly 300 minutes of audiovisual material were analyzed, with a focus on subtitling accuracy and two other quality indicators, namely subtitling speed and subtitling delay. All samples belong to one of the 17 territorial newscasts in Spain, specifically the Galician one, in which Galician and Spanish coexist in the original dialogues. The main intention was to examine the software’s performance and enhance its functioning based on the data collected. Therefore, several reports on the quality of the samples were elaborated periodically so that developers could gradually improve the SRS capabilities.
By providing the findings of the quality analysis conducted on the bilingual automatic subtitles of fifty-five 5-minute samples supplied by TVE, this presentation intends to 1) offer a complete understanding of the performance and evolution of the SRS under consideration, 2) show the importance of selecting the most appropriate quality assessment tool depending on the end goal, and 3) shed some light on how quality analysis can contribute to the improvement and implementation of SRS for automatic live subtitling on TV and, ultimately, the enhancement of media accessibility.
In order to analyze subtitling accuracy, the WER Model was applied to all samples, as requested by the broadcaster, resulting in an average accuracy rate of around 70% and an average WER rate below 20%. Although some improvement was detected regarding the SRS performance, this was neither consistent nor regular as the error rate did not decrease gradually over the weeks. Additionally, the NER Model was applied to twenty of the samples, mainly on account of its viewer-centered nature. The findings revealed that no sample reached the 98% NER quality threshold, implying that these automatic subtitles would not meet users’ needs and, therefore, would not fully accomplish their intended purpose. Regarding subtitling speed and delay, results were calculated and evaluated in accordance with the UNE-153010-2012 Standard. It was observed that speed was slightly higher than the maximum established of 15 cps (16.5 cps on average) whereas delay remained below the limit of 8 seconds (6.5 seconds on average).
In addition to the figures, qualitative results are also provided in this presentation. Specifically, information about the most prevalent errors identified throughout the analysis is offered, including those related to numbers, COVID argot, and proper names, among others. The weekly and monthly reports are presented, detailing the type of errors included therein, and brief remarks are finally shared with the audience addressing the strengths and weaknesses of the software, which was heavily influenced by the features of the original audiovisual material. In fact, the SRS has shown quality results under highly controlled conditions, such as the absence of background noise and the participation of speakers who communicate fluently and accurately with clear diction and pronunciation. On the contrary, the detection of language changes (from Galician to Spanish, and vice versa), sometimes influenced by how some locals speak, has proven to be problematic for the recognizer.
Although the coexistence of Galician and Spanish in the original dialogues complicates the already challenging process of automatic subtitling, it is not such a decisive factor as to prevent the use of such technologies in the territorial newscast. In fact, the possibility of using an SRS that can handle both languages during a TV broadcast is quite meaningful, particularly given that this is the first attempt at developing bilingual software involving Galician, a minority and minoritized language. Despite far from being ideal, the results of this study reveal the exponential growth and improvement of automatic subtitling and provide grounds for further research on this field, which is common practice in our everyday lives and will continue to be so on television as well.