Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first- and second-language educational context

The addition of subtitles to videos has the potential to benefit students across the globe in a context where online video lectures have become a major channel for learning, particularly because, for many, language poses a barrier to learning. Automated subtitling, created with the use of speech-recognition software, may be a powerful way to make this a scalable and affordable solution. However, in the absence of thorough post-editing by human subtitlers, this mode of subtitling often results in serious errors that arise from problems with speech recognition, accuracy, segmentation and presentation speed. This study therefore aims to investigate the impact of automated subtitling on student learning in a sample of English firstand secondlanguage speakers. Our results show that high error rates and high presentation speeds reduce the potential benefit of subtitles. These findings provide an important foundation for future studies on the use of subtitles in education.


Introduction
Subtitles are used extensively in movies and television programmes as well as in online media such as YouTube, Netflix and TED. Intralingual subtitles were initially added to television programmes targeted at the hearing-impaired population (Gernsbacher, 2015;Taylor, 2005), but it was observed later that the hearing population are also benefiting from the insertion of subtitles, for example in the context of language learning (Garza, 1991). Previous research (Bird & Williams, 2002;Danan, 2004;Garza, 1991;Gernsbacher, 2015;Markham, 1999;Moreno & Mayer, 2002;Perego, Del Missier, Porta, & Mosconi, 2010;Vanderplank, 1988) has shown a difference in areas such as vocabulary acquisition, comprehension and task performance between hearing viewers who watched a video with subtitles and viewers who saw the same video without subtitles -in favour of the group that saw the subtitled video. However, not all studies confirm this benefit (Kruger, 2016;Kruger, Hefer, & Matthew, 2014;Kruger & Steyn, 2013) and more research is needed to answer the questions of how subtitles influence learning and under what conditions subtitles have the optimal influence. As many educational entities use videos in their online learning programmes -for instance, MIT OpenCourseWare, Coursera and Khan Academy -a large amount of subtitling is required to ensure access for potential users. However, creating subtitles manually is not cost-effective and is also time-consuming. Conventional subtitles are created by subtitlers who transcribe and edit the audio dialogue into written text and add the text to the screen using subtitling software. The Code of Good Subtitling Practice compiled by Ivarsson and Carroll (1998) suggests some general guidelines to ensure both high quality and accurate, readable subtitles. The subtitling convention suggested that a minimum display time of one second with short subtitles and a maximum display time of six seconds are recommended for full two-liners on screen, with an average reading speed of 12 characters per second (cps) suggested (Díaz-Cintas & Remael, 2007;Ivarsson & Carroll, 1998). Szarkowska and Bogucka (2019) provide an overview of the origins of what is generally referred to as the six-second rule. They point out that the standard for TED Talks is that subtitles should be presented at a rate below 22 characters per second (cps), whereas Netflix uses 17 cps across the globe. Because of space constraints, there should be no more than two lines per subtitle and the maximum number of characters on each line, including spaces, should be between 33 and 35. Some may allow up to 30-41 characters, depending on the guidelines and the software used. Ivarsson and Carroll (1998) advise that subtitlers may need to reduce the original transcript by as much as 30 per cent because of space and time constraints; this can be done by reformulating, condensing and adapting the original text.
For the purposes of this article, automatic subtitles are defined as a text generated automatically with the help of speech-recognition tools. Such text is then synchronized with the video. This type of subtitling could be a solution, owing to its time-and cost-effectiveness. Corrected subtitles are defined as a text generated automatically in the same way as automatic subtitles but where the text was edited by a subtitler for grammar, spelling and segmentation but otherwise remains verbatim. Automated subtitles could be a very powerful tool for improving learning in education if problems with their accuracy, the chunking of text and presentation speed could be resolved when automated subtitles are created (Doherty & Kruger, 2018). Table 1 Examples of automatically generated subtitles vs corrected subtitles (Gruber, 2011)  Automatic subtitles are advantageous because their creation is faster, more cost-effective and less time-consuming. However, because they are not created by a human translator or subtitler, this mode of subtitling can lead to serious errors, such as problems in speech recognition, accuracy, segmentation of a two-line subtitle and reading speed. But automated subtitles are potentially powerful in enabling wider accessibility if only their accuracy and readability could be resolved. Also, from a methodological standpoint, the impact of subtitle quality in an educational context could be investigated more comprehensively (cf. Kruger & Doherty, 2016).

Automatically generated subtitles and related issues
It is logical to seek a solution to producing large quantities of subtitles economically within a shorter time frame. Automatically generated subtitles could be instrumental in making this possible. Wald and Bain (2008) explain that automated subtitles are created by means of automatic speech-recognition (ASR) software in order to provide a verbatim transcript from spoken dialogue. The text transcription is then synchronized with the video by means of the timing information so that subtitles can be generated automatically. There are three main problems with automated subtitles: accuracy (Parton, 2016), reading speed (Romero-Fresco, 2016) and the chunking of text (Rajendran, Duchowski, Orero, Martínez, & Romero-Fresco, 2013). Typically, ASR has been observed to achieve an average accuracy rate of between 60% and 90%, depending on the environment and the method of evaluation used (Anantaram, Kopparapu, Patel, & Mittal, 2016). But the accuracy rate can improve up to at least 98% with editing, pre-recorded transcripts and training the computer to recognize the speaker's voice (Wald & Bain, 2008). But automatically generated subtitles can be created with high accuracy only if they adhere to certain criteria (Jurafsky & Martin, 2009). Jurafsky and Martin (2009) described these criteria as: a slow, clear and consistent speech style where the speaker speaks in a standard dialect, with no . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, background noise or sound effects, and limited vocabulary recognition at a time. However, it is hardly possible to fulfil these criteria in real-life settings.
Presentation speed (also referred to by some researchers as reading speed) is another problem that affects viewers' comprehension of video content: automated subtitles are usually presented at a faster speed since they are fully verbatim. This problem is mainly caused by the high speech rate of the speakers in video (Romero-Fresco, 2016). In his study, Romero-Fresco (2016) concluded that fully verbatim subtitles are not desirable, especially with the introduction of the hybrid mode, which combines pre-recorded transcripts and live subtitles, because it is far too fast for any viewer to read and comprehend despite its high accuracy rate. In this respect, it is necessary to reduce the text to make readable and comprehensible subtitles possible, and inevitably professionals have to be involved in this process. In a recent eye-tracking study, however, Szarkowska and Gerber-Morón (2018) suggest that viewers are capable of processing faster subtitles. In particular, they found that most viewers in their study could read the subtitles and follow the images at a presentation rate of 20 cps.
Finally, research shows that text chunking increases the effectiveness of subtitle reading by reducing the amount of time spent on subtitles (Rajendran et al., 2013). Rajendran et al. (2013) investigated the impact of subtitle segmentation using eye-movement metrics such as fixation durations. Their eye-tracking data shows that chunking the text by phrase or by sentence reduces the amount of time viewers spend on reading subtitles, and the processing of subtitles is rendered easier through this presentation of the text.
These findings suggest that automated subtitles can be used effectively only if the associated problems with accuracy, reading speed and text chunking can be solved.
The US National Institute of Standards and Technology uses the following formula to calculate word accuracy (Dumouchel, Boulianne, & Brousseau, 2011): According to Romero-Fresco and Martínez (2015), this model draws on the basic principles of WER (word error rate), which have traditionally been applied in analysing the accuracy of speech recognition. In this model, N is the total number of words spoken by the speaker. The errors include D for words deleted, S for words substituted and I for words inserted incorrectly by the speech-recognition software. Romero-Fresco and Martínez (2015) noted that this model of assessment used for speech recognition is not adaptable to European countries in which subtitles are mostly created by respeakers. In other words, the model does not account for editing errors and penalizes for edits that are not errors. Romero-Fresco later introduced the NER model to measure the quality of intralingual live subtitling by respeaking, which takes into account the number of words in the subtitles, edition errors by strategies used by the subtitler, and . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, recognition errors by mispronunciation or mishearing introduced by speech recognition or stenography (Romero-Fresco & Martínez, 2015). Romero-Fresco and Martínez (2015) indicated that, in the NER model, the accuracy rate of any live subtitles has to reach at least 98% accuracy in order to be considered acceptable.
The majority of studies on intralingual subtitled video created by human beings in an education context investigated the impact of subtitles on learning, usually by comparing subtitled and unsubtitled conditions. However, there are hardly any empirical studies that compare the impact of the type or the quality of subtitles in terms of accuracy, segmentation and reading speed.
Owing to the problems with accuracy and presentation speed of automated subtitles (that tend to be verbatim transcripts, including spoken features that compromise coherence during reading), it is still uncertain whether automated subtitles have a positive impact on learning in comparison to video without subtitles. It therefore appears logical that these automatically generated subtitles have to be corrected; but in the interests of efficiency, it is important to determine to what extent the correction of errors and grammar, as well as the reduction of presentation speed, has an impact on learning and cognitive load (CL). It is the aim of this study to attempt to address this question in order to ascertain whether there is any difference in impact on learning between unsubtitled video, video with automatically generated subtitles and video with corrected subtitles.
The following sections provide information on CL theory, the role of subtitles in education, the benefits of subtitles in learning, how subtitles affect CL during the learning process and how subtitles are created.

Cognitive load theory
The major concern of having subtitles in videos is the amount of CL it could generate. CL is a multidimensional construct that represents the load imposed on the learner's cognitive system while performing a particular task (Paas, Tuovinen, Tabbers, & van Gerven, 2003;Paas & van Merriënboer, 1994). Cognitive load theory (CLT) is a cognitive architecture based on the concept of a limited working memory, with processing units for visual and auditory information that interact with a comparatively unlimited long-term memory (Paas et al., 2003;Sweller, 2003;Sweller & Chandler, 1991;Sweller & Sweller, 2006 Paas et al., 2003;Sweller, 1988;Sweller, van Merriënboer, & Paas, 1998).
Intrinsic load is an interaction between the nature of the material that needs to be learned and the learner's level of expertise (Paas et al., 2003). Extraneous load relates to instructional presentation that could reduce learning by increasing the CL on learners (Debue & van de Leemput, 2014). Germane load is the remaining capacity that enables learning to be effectivelearning improves if more working memory resources are available when processing information (Sweller, 2010). . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18,

Subtitles in education
Education has become significantly more global in the past decade, with international students becoming a consistent feature in most developed countries (Australian Government, Department of Education and Training, 2016). With this trend a new problem has emerged: that of students not studying in their native language. Intralingual English subtitles can therefore be a useful tool through which to make educational content more accessible through the textual confirmation of auditory speech.
Video use in modern education, both online and in class, could be an important aid for learning to establish learning engagement. Compared to conventional textbooks, information in audiovisual presentations improves comprehension and recall. Research has shown that the use of video in an educational context has a positive impact on learning (Armstrong, Idriss, & Kim, 2011;Merkt, Weigand, Heier, & Schwan, 2011;Wilson et al., 2010). Adding subtitles to video could assist students in the learning process. Studies have also shown that subtitle reading is an automatic human behaviour (d 'Ydewalle & De Bruycker, 2007;d'Ydewalle, Praet, Verfaillie, & Rensbergen, 1991), and that this behaviour helps students to engage in the learning process because students look at the text on video naturally and automatically. As a large number of educational entities, such as Coursera, 1 Khan Academy, 2 Academic Earth, 3 and other open courses organized by universities such as Stanford, Massachusetts Institute of Technology (MIT), Yale and Harvard, are using online video lectures as part of their educational programmes, adding subtitles is essential to ensure that people with different needs are able to gain access to the content of the video lectures, including those with hearing loss and learning needs. These online education entities have different ways of providing subtitles, mostly through the community volunteering in creating subtitles. In 2014, Coursera announced that their courses will be subtitled and translated by the Global Translator Community (GTC), which is a community of volunteers and partner organizations that work together to make educational content accessible to learners around the world (Coursera, 2014;n.d.). MIT OpenCourseWare also depends on volunteers to create and translate subtitles (Amara, 2010); Khan Academy (n.d.) creates its own English subtitles professionally. Other open courses by universities have not stated clearly how they create their subtitles, but most of their online lectures can be viewed on YouTube with subtitles that are quite accurate, but mostly verbatim.

The benefits of subtitles
The benefits of intralingual subtitles created by human beings in education and language learning have been proven in various studies (Bird & Williams, 2002;Danan, 2004;Garza, 1991;Markham, 1999;Moreno & Mayer, 2002;Perego et al., 2010;Vanderplank, 1988). A study by Garza (1991) found that the use of subtitles could bridge the gap between reading and listening comprehension, . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies,18, which facilitated language use in context. The findings by Garza (1991) provided evidence of the benefits of implementing subtitles in learning materials for L2 students during their early years of overseas study. Zhang and Mi (2010) found that, because speaking and listening skills are particularly problematic in L2 students, subtitles could indeed be beneficial in addressing some of these language issues. Likewise, Danan (2004) found that subtitles enhance the listening comprehension of non-native language learners and that they facilitate language learning through deepening cognitive processing. Bird and Williams (2002) showed that exposure to subtitles increased word learning and word recognition, resulting in better comprehension. Similarly, Vanderplank (1988) indicated that the use of subtitles improved comprehension. Markham (1999) also demonstrated that the availability of subtitles improved word recognition in university-level English as second language (ESL) students. He further noted that students with an advanced level of second-language reading ability can use subtitles to develop their listening skills. Other studies, such as those by Moreno and Mayer (2002) and Vanderplank (1988), illustrated the benefits of subtitles in language learning.
The benefit of the presence of intralingual subtitles is found in the Dual Coding Theory: as information is repeated by means of different channels, readers and learners retain better information (Paivio, 1991). However, it has not been tested with automated subtitles.
Based on CLT, Diao et al. (2007) and Mayer et al. (2001) suggest that intralingual subtitles for ESL students and created by human beings might cause cognitive overload and thus decrease performance. Diao et al. (2007) stated that the redundancy effect occurs when learners have to coordinate mentally the same information presented simultaneously in different channels. This action causes learners to divide their attention between materials, which leads to an increase in extraneous CL and interferes with the learning of the information being presented (Kalyuga et al., 1999). Kalyuga and Sweller (2014) suggested that redundant information that is not needed for learning should be omitted to avoid negative learning outcomes because limited working memory is allocated to coordinating unnecessary information, thus decreasing the cognitive capacity for learning.
In contrast,  found no significant impact on cognitive load with or without subtitle exposure. In their study, students' performance was not affected by the double exposure of auditory and written form of content and the redundancy effect did not occur. Kruger (2013) . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies,18, explained that the presence of the dual coding effect may free the simultaneous presentation of both auditory and visual information from the redundancy effect under certain conditions. He further argued that subtitles have a positive impact on language performance as an overall outcome if no overly complicated multimedia materials are presented at the same time (Kruger, 2013;.
However, the findings of  showed a significant difference in CL between groups, with no subtitles producing a higher CL and higher frustration levels. They concluded that their results suggested that ESL students who learn through the medium of English using samelanguage subtitles could experience lower CL if they were to use subtitled video rather than unsubtitled video. In other words, their study showed that the presence of intralingual subtitles provided students with support that helps with their processing and understanding of the learning content.
Based on these previous studies, CL in intralingual subtitles for educational videos depends on redundancy and the dual coding effect. However, the effects of automated subtitles in the context of educational videos have not been studied.

The availability of subtitles
There are barriers to making subtitles available in audiovisual programs in spite of their benefits. The conventional way of subtitling is expensive and time-consuming. Professionals are needed in the process of creating quality subtitles, which is not always practical or financially viable. Even though professionally created subtitles are more accurate with better quality, this method has also become a major barrier and consideration to adding subtitles in every video. Automated subtitles could be the solution in this respect and could be widely used to reduce both cost and effort.
Automated subtitles are created by means of ASR software to provide a verbatim displayed transcript from spoken dialogues (Wald & Bain, 2008). The text transcription synchronizes with the video by the timing information so that online videos can be subtitled automatically (Díaz-Cintas, 2014;Wald, 2006;Wald & Bain, 2008). Automated subtitles are easy to access and they can be added to any education videos by just following simple steps of instructions, while expert skills are not necessary.
In addressing the challenge of subtitling ("captioning" is the original term used on the Google site) for users who upload their videos online, Ken Harrenstien, a deaf software engineer who led the captioning project for Google, combined Google ASR technology with the YouTube caption system to offer automatic captions, or auto-caps for short (Díaz-Cintas, 2014;Harrenstien, 2009). As Harrenstien (2009) explained in the official Google blog, auto-caps automatically generate subtitles for video by using the same voice-recognition algorithms also used in Google Voice. Google announced a 23% error rate on word recognition in 2013, and their system has been improved to reduce the error rate down to 8% in May 2015 (Shokouhi, Ozertem, & Craswell, 2016).
In the 2017 Internet Trends Report, Meeker (2017, slide 48) reported that Google ASR has achieved an accuracy rate of 95% for English, which is the threshold of human accuracy. Even though Parton (2016) found that automatically generated subtitles are not accurate enough to be used exclusively for the deaf and hard-of-hearing population, auto-caps are still helpful in the way that average viewers can understand what was presented in the text (Gernsbacher, 2015).
In addition to auto-caps, Google launched automatic caption timing, or auto-timing, to make creating manual subtitles significantly easier for video owners by cueing the uttered words automatically in the video (Díaz-Cintas, 2014;Harrenstien, 2009). With auto-timing, only a simple text file needs to be created for the transcribed text, then the captions will be created automatically by using Google's ASR technology and no special skills will be needed in the process.
By creating subtitles automatically, the barriers for video owners to adding subtitles will decrease considerably, as the time and resources for creating professional subtitle tracks will be reduced significantly. Automatic subtitling can potentially save a large number of resources, such as time and money, for many educational entities by producing large quantities of videos for online distance-learning programmes (Wald, 2013).

Goal of this study
The results of previous studies on CL in intralingual subtitles created by human beings showed that exposure to subtitles improved learning without increasing CL. However, there are no studies on the effects of automated subtitles in educational videos. Finding out whether automated subtitles are beneficial for educational purposes is important because it will increase the accessibility of the existing online education programmes with cost-effectiveness. In this article, we investigate the impact of automated subtitles and corrected subtitles on learning by comparing the effects of both subtitles through the measures of CL and performance.

Method
In order to address the issue of the impact of automated and corrected subtitles on learning and CL, it is essential to find out whether adding automatically generated or corrected subtitles to an educational video would improve learning; and, if it does, whether there are any differences in the impact of these subtitles on both learning and CL.
To answer these questions, this study used an experiment in an academic context where students (both English-speakers and ESL) were exposed to three versions of the same online lecture: one group saw a video with no subtitles, a second group saw the same video with automated subtitles and the third group saw the video with corrected subtitles. A pre-test and a post-test were carried out to determine the impact of the automated subtitles, corrected subtitles and unsubtitled video on learning. A CL test was conducted to determine how each type of subtitle affected CL. . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies,18, Based on the literature, the hypotheses of the study are that automatically generated and corrected subtitles would result in better performance when compared to unsubtitled video; that automatically generated and corrected subtitles would result in lower CL when compared to unsubtitled video; and that corrected subtitles would outperform the other two test conditions.

Sampling
Using convenience sampling, we targeted first-year students from a bridging and a preparatory diploma at a university in Australia. This diploma course is a very structured and intensive programme, 4 with students completing two courses at a time in six-week terms. All the students in the Business and Economics programmes have to complete an introductory course on the Principles of Micro-Economics and more than 80% 5 of those students are from a non-English background.
There were seven groups, which ranged from 13 to 26 students per group. A total of 141 students participated in the study and each group was randomly assigned to one of the three video conditions -English video without subtitles (E), English video with English automatically generated subtitles (EA), and English video with English corrected subtitles (EC). Only 92 sets of data could be used in the study owing to either incomplete data or an invalid study procedure. The final division between the groups was 21 students in condition E, 34 students in condition EA, and 37 students in condition EC.
Seventy-four students were mainly from China, India, Vietnam, South Korea, Indonesia, Thailand, Bangladesh, Nepal and Pakistan, although there were also 15 local Australian students, one from Singapore, one from Ukraine and one from Sweden. However, owing to time constraints, the sample of participants in this study was not tested for its homogenous base on language proficiency.

Video and subtitles
For the experiment, a 25-minute extract was selected from a 50-minute video on the topic of Elasticity of Supply and Demand downloaded from MIT OpenCourseWare (Gruber, 2011). The decision to use a shorter video was taken for practical reasons, since the time allowed for taking down the data was very limited, owing to the structured curriculum. The style of video used in this experiment is a classroom lecture: a lecturer was talking in front of a lecture room without the students being shown. The video was then presented to each of the three test groups in one of the three test conditions (EA, EC or E).

Subtitle characteristics and quality
The video downloaded directly from MIT OpencourseWare provided only the corrected version of the text transcript, which was uploaded and synced with the video. The transcript was possibly produced by volunteers because, according to the documentation, MIT provided professional transcription only after 2012 (Khesin, 2012). The automatically generated subtitles of the same video were available through YouTube.
Compared to the recommended average subtitling speed of 12 cps (Díaz-Cintas & Remael, 2007), the subtitle presentation speed of the corrected video in the current study was extremely high, with 20% of the corrected version being between 15 and 19 cps and 32% being faster than 20 cps, as shown in Table 2.1. That is significantly above the threshold for comfortable reading, with too many subtitles being at a speed that would not be possible to read fully.

Questionnaires
The participants were required to complete a few questionnaires in this experiment, including a biographical survey, a pre-test, a CL measurement and a post-test. A biographical survey about the participants' background and language history was collected. A pre-test with ten multiplechoice (MC) questions on comprehension relating to the video content was employed (Appendix A). This was used as baseline information indicating the amount of prior knowledge the participants have before viewing. A CL measurement was used to determine the amount of selfperceived CL while viewing subtitled or unsubtitled video (Appendix B). A post-test with 30 MC items was used to measure the amount of content the students learned immediately after viewing (Appendix C).
All the questions were taken from existing courses. The ten-item pre-test consisted of questions relating to the 25-minute video content (e.g., Which of the following accurately characterize perfectly inelastic demand?). Five MCs (questions 1 to 5) of the pre-test were taken from the MIT OpenCourseWare (Gruber, 2011) website under the topic Elasticity of Supply and Demand, the video used in the study. It is a topic included in the course on micro-economic principles. These five questions were constructed to test key vocabulary terms and the students' understanding of key concepts covered in the video. The other five MCs (questions 6 to 10) of the pre-test were drawn from the sample questions on the topic "elasticity" by Frasca (2007), the professor of Economics at the University of Dayton. The elasticity sample questions consisted of a total of 130 MC questions, but not all of them are related to the concepts covered in the video, so those selected were screened against the video content to ensure item relevance. The 30 post-test items included the ten items from the pre-test with the addition of an extra 20 items from the elasticity sample questions by Frasca (2007). Those extra 20 items were also selected according to content covered in the video used in the experiment.
The CL measurement is an adaptation from Leppink et al. (2014). Leppink et al. (2014) developed a more precise instrument for measuring CL, which better differentiates three types of CL: intrinsic load, extraneous load and germane load. The instrument is a 13-item self-evaluated report, with items 1 to 4 measuring intrinsic load (e.g., The content covered in the video was very complex), items 5 to 8 measuring extraneous load (e.g., The explanations and instructions in this video were very unclear) and items 9 to 13 measuring germane load (e.g., The video really enhanced my understanding of the content that was covered).
The 30-item MC questionnaire has an item reliability index of .986 and the 13-item CL test has an item reliability index of .840. Reliability was calculated by means of Cronbach's alpha (Table 2.2a and b).

Design and procedure
This study is a quantitative experimental model that uses a three-group pre-test-post-test design (two test groups, one control group) with three subtitle conditions (EA, EC and E) as independent . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, variables and CL and performance measurement as dependent variables. Each group of students was randomly assigned to one of the following conditions: English video with no subtitles (E), English video with English subtitles generated automatically (EA) and English video with corrected English subtitles (EC). The performance and CL of the participants were compared between the three tested conditions. The research questions can be answered by showing whether the intervention of subtitling has any difference in impact on performance and CL by comparing the two subtitled conditions and the control group.
The participants were asked to perform a pre-test before and a post-test after viewing the video. The pre-test was used as a baseline but also to measure the amount of prior knowledge the participants had on the topic of elasticity. The post-test was intended to determine the impact of subtitles, both EA and EC, on performance when compared to the pre-test. The scores between EA and EC were also compared in order to determine the difference in impact, if there was any.
The CL measurement was conducted to determine the self-perceived effort in viewing with or without subtitles.
A biographical survey and a pre-test were conducted before the participants viewed the video.
The video was about 25 minutes in length and was projected on a big screen for the participants to watch in one of the three conditions. All the conditions were identical for each of the groups, including good sound and image. Two tests, in the order of (1) a CL measurement and (2) a performance post-test, were given to the participants for completion immediately after the viewing.

Data analysis
A multiple regression analysis was performed with self-reported CL and performance on pre-test, post-test and full post-test scores as dependent variables, and using the months the students spent in an English-speaking country as a predicting variable. The analysis revealed the length of residence in an English-speaking country not to be a predictor of CL and/or performance.
The data were sufficiently normally distributed to allow for the use of an ANOVA. To determine whether there is any difference between the participants in each of the three conditions, a oneway ANOVA was performed with dependent variables: full-test performance, post-test performance, difference between pre-and post-test performance, intrinsic load, extraneous load or germane load. Post hoc Tukey tests and reliability tests were also carried out. A significance level of α=.05 was adopted for all the statistical analyses reported.
There were two significant characteristics of these two sets of subtitles that could have had an impact on the performance, namely the high presentation speed of the corrected subtitles and the low accuracy rate of the automatic subtitles. The transcripts of both versions of the subtitles were manually inspected and analysed along with their presentation speed. The details of the findings are shown in Table 2.1. It is evident from the analysis that the presentation speed of both versions is too high in many instances. Another important aspect in relation to the automated subtitles is the rate of errors as a result of incorrect speech recognition. In fact, there are 1 160 errors in the automated subtitles, including 209 omissions but excluding a few remarks or questions by students that were not subtitled at all in the automated version. These omissions were included in the corrected version. The errors are caused by misrecognition of the spoken words by the speech-recognition software. The word accuracy was 68.88%, calculated by applying the formula used by the US National Institute of Standards and Technology (Dumouchel et al., 2011).

Performance
A one-way ANOVA on the pre-test results showed no significant difference between the three groups in terms of prior knowledge on the topic, as shown in Table 3.1. Considering their performance on the full 30-item test, the group that saw the unsubtitled video (43.97%) did slightly better than the group which saw the automatic subtitles (42.84%) and the group that saw the corrected subtitles performed worst (38.29%; see Figure 3.1 and Table 3.2). However, these differences did not reach significance in a one-way ANOVA, as can be seen in Table 3.3. . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 18, 237-272.
251 Considering the difference between the performance on the ten-item pre-and post-test, it appears that the group that saw the video with corrected subtitles did not improve and, in fact, did slightly worse than the other two, as can be seen in Figure 3.2 and Table 3.4; but the result is not statistically significant (p=.061; see Table 3.5).  The results indicate that the students performed worse in the post-test after watching the video with corrected subtitles and the students performed better in the post-test, with a similar performance level in both the unsubtitled and the automated subtitle conditions. Figure 3.3 shows that the data of the pre-and post-test difference is normally distributed. The post-test used here to find out the difference is only the subset of the total post-video questionnaire and it is the same 10 items that were used in the pre-test. A one-way between-subjects ANOVA was conducted to compare the effect of subtitles on performance in no subtitles, automated subtitles and corrected subtitles conditions, and also to determine whether the difference between the pre-test and the post-test is significant. The difference between the pre-and post-test in fact approached significance with F(2,89)=2.89, p=.061, as shown in Table 3.5. . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 18, 237-272.
253 Figure 3.3 The mean score for pre-and post-test difference is normally distributed Post hoc comparisons using the Tukey HSD test indicated that the difference between automated and corrected subtitle conditions (p=.083) is closer to the significant level (p<.05) in comparison to the difference between the unsubtitled and the subtitled conditions, as shown in Table 3.6. However, the differences between unsubtitled and automated subtitle conditions (p=.999) and between unsubtitled and corrected subtitle conditions (p=.159) are not significant. A Tukey test confirmed that the difference lies not between the unsubtitled condition and the subtitled condition but between the two subtitled conditions. . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia, New Series: Themes in Translation Studies, 18, 237-272. 254

Cognitive Load
Considering the CL induced by the different modes, there was a minor, insignificant difference (Table 3.8), as can be seen in Figure 3.4 and Table 3.7. . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, 255 A one-way between-subjects ANOVA was conducted to compare the impact of subtitles on the three types of CL in no subtitles, automated subtitles, and corrected subtitles conditions and also to determine whether the differences in intrinsic load, extraneous load and germane load are . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, significant. The result shows no significant effect of subtitles on CL at the p<.05 level for the three conditions, as shown in Table 3.8.

Discussion
The main aim of the study was to determine whether the different quality of subtitles has an impact on performance and CL. The results in this study suggest that these subtitles (automated or corrected) have no significant effect on performance and CL, although the corrected subtitles seemed to have a negative impact when compared to the automatic subtitles. Different reasons explain these findings, which run counter to the body of literature that evidenced subtitles to be beneficial to comprehension (Bird & Williams, 2002;Moreno & Mayer, 2002;Vanderplank, 1988).
It should be noted that the presentation speed of the automated and corrected subtitles is extremely fast in general, especially in the corrected subtitles, with more than one-third of them being faster than 20 cps. The data indicate that the students performed worse when watching a video with corrected subtitles, but the result is insignificant (p=.220; see Table 3.3). The lower mean score of the extraneous load indicates that subtitle presentation assists in information processing when compared to intrinsic load and germane load. The fact that the students put more effort into processing the content in the video with no subtitles condition gives a general indication that subtitles are still beneficial to providing a favourable learning environment, although none of these differences reached the level of significance. This makes it impossible to generalize from these findings. The results might also depend on how familiar these ESL students are with English and how they use subtitles in this context.
The results of this study do not show that adding automatically generated or corrected subtitles to an educational video improves learning. The reason why there was no significant difference could be ascribed to the fact that the error rate in the AE condition is substantially below respeaking standards. This would render the subtitles a distraction at best and probably resulted in the students' ignoring the subtitles, something that will have to be verified with eye-tracking studies. The high presentation speed in the CE condition would make it virtually impossible for students to read around one-third of the subtitles, which would have been a serious distraction and could easily have interfered with comprehension. This seems to be supported by the findings.
The results do not support the three hypotheses of the study. The first hypothesis predicted that automatically generated and corrected subtitles would result in better performance when compared to unsubtitled video. However, our results showed that students who saw unsubtitled video (43.9%) did slightly better than those who saw either automated (42.84%) or corrected (38.29%) subtitles, even though the result is statistically insignificant.
The second hypothesis predicted that automatically generated and corrected subtitles would result in a lower CL when compared to unsubtitled video. However, our result showed that there is no significant effect of subtitles on CL for all the three test conditions. Furthermore, the trend of the data did show that both AE and CE conditions result in lower intrinsic and extraneous load and, as a result, this meant that the students would theoretically have more germane load . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, available, even though they could not benefit from this in view of the problems with the two subtitled versions. Also, the lack of statistical significance means that this is at best a trend that will have to be established with refined experiments.
Considering hypothesis three, the students who saw the corrected subtitles did not outperform those who performed under the other two test conditions; instead, the trend of the results indicates that the students performed worst when viewing the corrected subtitles, though again the result is not statistically significant.
The trend of the current results shows that the presentation of subtitles helps with processing information with less mental effort, in line with the results of  that subtitles reduced the CL of students when processing learning materials. There are a number of explanations for the findings presented here. The major reason that this study does not have any significant results is that the presentation speed was so variable and so high in both automatically generated and corrected subtitles versions that it negated any possible benefits. The fact that 20 per cent of the subtitles in the corrected version are between 15 and 19 cps and 32 per cent of its subtitles are 20 cps and faster made the corrected subtitles impossible to read. Even though the corrected version is very accurate, since the subtitles are created verbatim by ASR, the number of cps displayed to the reader is far too high. As concluded in the study by Romero-Fresco (2016), fully verbatim subtitles are not desirable because high presentation speed is caused mainly by the high speech rate of the speakers in the video, and it remains far too fast to be read and comprehended, despite its high accuracy rate.
The automated version is slightly slower than the corrected version, but the fact that it contains a high number of errors (there are 1 168 errors out of a total of 3 727 words, including omissions) negated the possible benefits of subtitle reading. The accuracy rate of the automated subtitles is 68.88%, as calculated in the material section.
When the errors in the automated transcript are investigated closely, it appears that these are quite serious and were mainly caused by word misrecognition. The automated version has far too many errors to be useful. This is particularly true in an academic context, where the accuracy of information is critical. It can therefore be expected that students would be frustrated by the high error rate and would either be distracted by the errors or simply start ignoring the subtitles. The high error rate made the automated condition similar to the unsubtitled condition and the high presentation rate resulted in lower performance; this also made the corrected condition similar to the unsubtitled condition. As indicated by the post hoc Tukey test, the difference shown in this study lies between the two subtitled conditions; it does not lie between the unsubtitled condition and the two subtitled conditions. However, the corrected subtitles did result in a higher germane load, which shows promise for the mode.

Conclusion
This study was conducted in order to investigate whether automatically generated or corrected subtitles would have an impact on performance and CL. The results from previous studies show that the simultaneous presentation of both visual and auditory information actually assists learning because of the dual coding effect, as discussed in section 1.5. Further investigations have been done in order to understand how subtitle reading would influence information processing and CL.
As different methods have been used to evaluate CL, contradictory results have been found regarding whether subtitle reading would cause cognitive overload and thus decrease learning. These contradictions may be caused by different cognitive measurements and study procedures. However, despite the contradictory results, the benefits of subtitles have been consistently proven in the literature, supporting the view that subtitling is beneficial in promoting comprehension as a precursor to learning (Bird & Williams, 2002;Moreno & Mayer, 2002;Vanderplank, 1988).
The increased use of online video lectures by large numbers of educational entities has led to a huge demand for substantial quantities of subtitled video. Automatically generated subtitles could be a solution towards meeting the need to produce subtitles quickly and economically. Automated subtitles are created through speech-recognition software, but usually with a high error rate that could potentially affect subtitle reading. It is only logical to correct these errors in order to improve readability. Since there are hardly any empirical studies that compare different types and quality of subtitles, it was the goal of this study to fill the gap in this research area.

Findings and implications
This study was conducted to investigate automatic and corrected subtitles as they are one of the types of subtitle that are normally available online (Parton, 2016). However, these modes of subtitle do not seem to be beneficial under conditions of high error rates or high presentation speed. It would therefore seem that for subtitles to be beneficial, they would have both to be corrected and to be presented at a reasonable reading speed. The real challenge will be to find ways of increasing the accuracy of transcripts and, even more so, of reducing the text automatically to bring down the presentation speed, and this means that the powerful mode of subtitling currently remains out of reach for the majority of institutions.
There are two possible implications of subtitles being either too fast to process and comprehend or containing too many errors to facilitate comprehension. First, in order to have a reasonable presentation speed, this mode of subtitles can be rendered useful only after being edited, which task includes summarizing, reducing and reformulating. Professional subtitlers remain essential in the editing the transcripts of original texts if subtitling standards are to be met (Díaz-Cintas & Remael, 2007). This implies that professionals would still be involved, resulting in the problem of . Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first-and second-language educational context. Linguistica Antverpiensia,New Series: Themes in Translation Studies,18, costliness; but the need to process large quantities of subtitles in a short time remains another challenge.
The second implication is that the audiovisual recording environment of the video would have to be controlled to the extent that the misrecognition by the speech-recognition software would be decreased. However, many variables are involved in controlling such an environment. Technology may advance to the point that it may be possible to have a highly sophisticated system capable of recognizing speech automatically, with high accuracy, and also capable of reducing and reformulating original transcripts simply. Future research on technology may make possible the production of automatic subtitles which are so accurate that readers will be able to process and comprehend them.

Limitations and further research
There are several limitations to this study. The current study used an availability sample from year one students and a larger sample size is essential in future research to yield possibly significant results. Given the nature of the available participants, we had limited control over their level of English, but are satisfied that they are comparable (see section 2.1). The fact that this study used a video of short duration (25 minutes), was based on a single viewing and had a nearly significant result implies that manipulating the experimental environment differently in future studies, such as using a video of longer duration for a longer period of time in a longitudinal study, may possibly have a better research outcome. In future research, corrected subtitles at lower presentation speed and over a longer period of time, such as a full school term, will be required, as the results would probably start manifesting only after more exposure and with students becoming used to the mode (Vanderplank, 1988). The videos used in this study featured only one topic, and further research should include a variety of different topics to ensure the generalizability of the results. Furthermore, eye-tracking studies should be conducted to gather more quantitative data and to determine to what extent the students actually try to read the subtitles.