Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first- and second-language educational context

Comparing the impact of automatically generated and corrected subtitles on cognitive load and learning in a first- and second-language educational context

Wing Shan Chan

Macquarie University, Australia

wing. chan@mq.edu.qu
https://orcid.org/0000-0002-9104-2470

Jan-Louis Kruger

Macquarie University, Australia

Janlouis.kruger@mq.edu.au
https://orcid.org/0000-0002-4817-5390

Stephen Doherty

University of New South Wales, Australia
s.doherty@unsw.edu.au
https://orcid.org/0000-0003-0887-1049

Abstract

The addition of subtitles to videos has the potential to benefit students across the globe in a context where online video lectures have become a major channel for learning, particularly because, for many, language poses a barrier to learning. Automated subtitling, created with the use of speech-recognition software, may be a powerful way to make this a scalable and affordable solution. However, in the absence of thorough post-editing by human subtitlers, this mode of subtitling often results in serious errors that arise from problems with speech recognition, accuracy, segmentation and presentation speed. This study therefore aims to investigate the impact of automated subtitling on student learning in a sample of English first- and second-language speakers. Our results show that high error rates and high presentation speeds reduce the potential benefit of subtitles. These findings provide an important foundation for future studies on the use of subtitles in education.

Keywords: educational subtitling, subtitle, subtitling, automatically generated subtitles, automated subtitling, cognitive load, language barrier in learning, English as Second Language, ESL

1. Background

1.1 Introduction

Subtitles are used extensively in movies and television programmes as well as in online media such as YouTube, Netflix and TED. Intralingual subtitles were initially added to television programmes targeted at the hearing-impaired population (Gernsbacher, 2015; Taylor, 2005), but it was observed later that the hearing population are also benefiting from the insertion of subtitles, for example in the context of language learning (Garza, 1991). Previous research (Bird & Williams, 2002; Danan, 2004; Garza, 1991; Gernsbacher, 2015; Markham, 1999; Moreno & Mayer, 2002; Perego, Del Missier, Porta, & Mosconi, 2010; Vanderplank, 1988) has shown a difference in areas such as vocabulary acquisition, comprehension and task performance between hearing viewers who watched a video with subtitles and viewers who saw the same video without subtitles – in favour of the group that saw the subtitled video. However, not all studies confirm this benefit (Kruger, 2016; Kruger, Hefer, & Matthew, 2014; Kruger & Steyn, 2013) and more research is needed to answer the questions of how subtitles influence learning and under what conditions subtitles have the optimal influence.

As many educational entities use videos in their online learning programmes – for instance, MIT OpenCourseWare, Coursera and Khan Academy – a large amount of subtitling is required to ensure access for potential users. However, creating subtitles manually is not cost-effective and is also time-consuming. Conventional subtitles are created by subtitlers who transcribe and edit the audio dialogue into written text and add the text to the screen using subtitling software. The Code of Good Subtitling Practice compiled by Ivarsson and Carroll (1998) suggests some general guidelines to ensure both high quality and accurate, readable subtitles. The subtitling convention suggested that a minimum display time of one second with short subtitles and a maximum display time of six seconds are recommended for full two-liners on screen, with an average reading speed of 12 characters per second (cps) suggested (Díaz-Cintas & Remael, 2007; Ivarsson & Carroll, 1998). Szarkowska and Bogucka (2019) provide an overview of the origins of what is generally referred to as the six-second rule. They point out that the standard for TED Talks is that subtitles should be presented at a rate below 22 characters per second (cps), whereas Netflix uses 17 cps across the globe. Because of space constraints, there should be no more than two lines per subtitle and the maximum number of characters on each line, including spaces, should be between 33 and 35. Some may allow up to 30–41 characters, depending on the guidelines and the software used. Ivarsson and Carroll (1998) advise that subtitlers may need to reduce the original transcript by as much as 30 per cent because of space and time constraints; this can be done by reformulating, condensing and adapting the original text.

For the purposes of this article, automatic subtitles are defined as a text generated automatically with the help of speech-recognition tools. Such text is then synchronized with the video. This type of subtitling could be a solution, owing to its time- and cost-effectiveness. Corrected subtitles are defined as a text generated automatically in the same way as automatic subtitles but where the text was edited by a subtitler for grammar, spelling and segmentation but otherwise remains verbatim. Automated subtitles could be a very powerful tool for improving learning in education if problems with their accuracy, the chunking of text and presentation speed could be resolved when automated subtitles are created (Doherty & Kruger, 2018).

Table 1 Examples of automatically generated subtitles vs corrected subtitles (Gruber, 2011)

Automatically generated subtitles	Corrected subtitles
arm we talked about last time	What we talked about last time
was to sort a qualitative affects	was the sort of qualitative effects,
the qualitative version	the qualitative version
supply demand model	of the supply and demand model.
we talked about what happens	We talked about what happens
when a supply curve ship	when a supply curve shifts,
what happens the main kar ships	what happens when a demand curve shifts.

Automatic subtitles are advantageous because their creation is faster, more cost-effective and less time-consuming. However, because they are not created by a human translator or subtitler, this mode of subtitling can lead to serious errors, such as problems in speech recognition, accuracy, segmentation of a two-line subtitle and reading speed. But automated subtitles are potentially powerful in enabling wider accessibility if only their accuracy and readability could be resolved. Also, from a methodological standpoint, the impact of subtitle quality in an educational context could be investigated more comprehensively (cf. Kruger & Doherty, 2016).

1.2 Automatically generated subtitles and related issues

It is logical to seek a solution to producing large quantities of subtitles economically within a shorter time frame. Automatically generated subtitles could be instrumental in making this possible. Wald and Bain (2008) explain that automated subtitles are created by means of automatic speech-recognition (ASR) software in order to provide a verbatim transcript from spoken dialogue. The text transcription is then synchronized with the video by means of the timing information so that subtitles can be generated automatically. There are three main problems with automated subtitles: accuracy (Parton, 2016), reading speed (Romero-Fresco, 2016) and the chunking of text (Rajendran, Duchowski, Orero, Martínez, & Romero-Fresco, 2013). Typically, ASR has been observed to achieve an average accuracy rate of between 60% and 90%, depending on the environment and the method of evaluation used (Anantaram, Kopparapu, Patel, & Mittal, 2016). But the accuracy rate can improve up to at least 98% with editing, pre-recorded transcripts and training the computer to recognize the speaker’s voice (Wald & Bain, 2008). But automatically generated subtitles can be created with high accuracy only if they adhere to certain criteria (Jurafsky & Martin, 2009). Jurafsky and Martin (2009) described these criteria as: a slow, clear and consistent speech style where the speaker speaks in a standard dialect, with no background noise or sound effects, and limited vocabulary recognition at a time. However, it is hardly possible to fulfil these criteria in real-life settings.

Presentation speed (also referred to by some researchers as reading speed) is another problem that affects viewers’ comprehension of video content: automated subtitles are usually presented at a faster speed since they are fully verbatim. This problem is mainly caused by the high speech rate of the speakers in video (Romero-Fresco, 2016). In his study, Romero-Fresco (2016) concluded that fully verbatim subtitles are not desirable, especially with the introduction of the hybrid mode, which combines pre-recorded transcripts and live subtitles, because it is far too fast for any viewer to read and comprehend despite its high accuracy rate. In this respect, it is necessary to reduce the text to make readable and comprehensible subtitles possible, and inevitably professionals have to be involved in this process. In a recent eye-tracking study, however, Szarkowska and Gerber-Morón (2018) suggest that viewers are capable of processing faster subtitles. In particular, they found that most viewers in their study could read the subtitles and follow the images at a presentation rate of 20 cps.

Finally, research shows that text chunking increases the effectiveness of subtitle reading by reducing the amount of time spent on subtitles (Rajendran et al., 2013). Rajendran et al. (2013) investigated the impact of subtitle segmentation using eye-movement metrics such as fixation durations. Their eye-tracking data shows that chunking the text by phrase or by sentence reduces the amount of time viewers spend on reading subtitles, and the processing of subtitles is rendered easier through this presentation of the text.

These findings suggest that automated subtitles can be used effectively only if the associated problems with accuracy, reading speed and text chunking can be solved.

The US National Institute of Standards and Technology uses the following formula to calculate word accuracy (Dumouchel, Boulianne, & Brousseau, 2011):

According to Romero-Fresco and Martínez (2015), this model draws on the basic principles of WER (word error rate), which have traditionally been applied in analysing the accuracy of speech recognition. In this model, N is the total number of words spoken by the speaker. The errors include D for words deleted, S for words substituted and I for words inserted incorrectly by the speech-recognition software. Romero-Fresco and Martínez (2015) noted that this model of assessment used for speech recognition is not adaptable to European countries in which subtitles are mostly created by respeakers. In other words, the model does not account for editing errors and penalizes for edits that are not errors. Romero-Fresco later introduced the NER model to measure the quality of intralingual live subtitling by respeaking, which takes into account the number of words in the subtitles, edition errors by strategies used by the subtitler, and recognition errors by mispronunciation or mishearing introduced by speech recognition or stenography (Romero-Fresco & Martínez, 2015). Romero-Fresco and Martínez (2015) indicated that, in the NER model, the accuracy rate of any live subtitles has to reach at least 98% accuracy in order to be considered acceptable.

The majority of studies on intralingual subtitled video created by human beings in an education context investigated the impact of subtitles on learning, usually by comparing subtitled and unsubtitled conditions. However, there are hardly any empirical studies that compare the impact of the type or the quality of subtitles in terms of accuracy, segmentation and reading speed. Owing to the problems with accuracy and presentation speed of automated subtitles (that tend to be verbatim transcripts, including spoken features that compromise coherence during reading), it is still uncertain whether automated subtitles have a positive impact on learning in comparison to video without subtitles. It therefore appears logical that these automatically generated subtitles have to be corrected; but in the interests of efficiency, it is important to determine to what extent the correction of errors and grammar, as well as the reduction of presentation speed, has an impact on learning and cognitive load (CL). It is the aim of this study to attempt to address this question in order to ascertain whether there is any difference in impact on learning between unsubtitled video, video with automatically generated subtitles and video with corrected subtitles.

The following sections provide information on CL theory, the role of subtitles in education, the benefits of subtitles in learning, how subtitles affect CL during the learning process and how subtitles are created.

1.3 Cognitive load theory

The major concern of having subtitles in videos is the amount of CL it could generate. CL is a multidimensional construct that represents the load imposed on the learner’s cognitive system while performing a particular task (Paas, Tuovinen, Tabbers, & van Gerven, 2003; Paas & van Merriënboer, 1994). Cognitive load theory (CLT) is a cognitive architecture based on the concept of a limited working memory, with processing units for visual and auditory information that interact with a comparatively unlimited long-term memory (Paas et al., 2003; Sweller, 2003; 2004; 2011; Sweller & Chandler, 1991; Sweller & Sweller, 2006). CLT differentiates between three types of CL: intrinsic load, extraneous load and germane load (de Jong, 2010; Debue & van de Leemput, 2014; Leppink, Paas, van der Vleuten, van Gog, & van Merriënboer, 2013; Leppink, Paas, van Gog, van der Vleuten, & van Merriënboer, 2014; Paas & Sweller, 2014; Paas et al., 2003; Sweller, 1988; 2010; Sweller, van Merriënboer, & Paas, 1998).

Intrinsic load is an interaction between the nature of the material that needs to be learned and the learner’s level of expertise (Paas et al., 2003). Extraneous load relates to instructional presentation that could reduce learning by increasing the CL on learners (Debue & van de Leemput, 2014). Germane load is the remaining capacity that enables learning to be effective – learning improves if more working memory resources are available when processing information (Sweller, 2010).

1.4 Subtitles in education

Education has become significantly more global in the past decade, with international students becoming a consistent feature in most developed countries (Australian Government, Department of Education and Training, 2016). With this trend a new problem has emerged: that of students not studying in their native language. Intralingual English subtitles can therefore be a useful tool through which to make educational content more accessible through the textual confirmation of auditory speech.

Video use in modern education, both online and in class, could be an important aid for learning to establish learning engagement. Compared to conventional textbooks, information in audiovisual presentations improves comprehension and recall. Research has shown that the use of video in an educational context has a positive impact on learning (Armstrong, Idriss, & Kim, 2011; Merkt, Weigand, Heier, & Schwan, 2011; Wilson et al., 2010). Adding subtitles to video could assist students in the learning process. Studies have also shown that subtitle reading is an automatic human behaviour (d'Ydewalle & De Bruycker, 2007; d'Ydewalle, Praet, Verfaillie, & Rensbergen, 1991), and that this behaviour helps students to engage in the learning process because students look at the text on video naturally and automatically. As a large number of educational entities, such as Coursera,^{^[1]} Khan Academy,^{^[2]} Academic Earth,^{^[3]} and other open courses organized by universities such as Stanford, Massachusetts Institute of Technology (MIT), Yale and Harvard, are using online video lectures as part of their educational programmes, adding subtitles is essential to ensure that people with different needs are able to gain access to the content of the video lectures, including those with hearing loss and learning needs. These online education entities have different ways of providing subtitles, mostly through the community volunteering in creating subtitles. In 2014, Coursera announced that their courses will be subtitled and translated by the Global Translator Community (GTC), which is a community of volunteers and partner organizations that work together to make educational content accessible to learners around the world (Coursera, 2014; n.d.). MIT OpenCourseWare also depends on volunteers to create and translate subtitles (Amara, 2010); Khan Academy (n.d.) creates its own English subtitles professionally. Other open courses by universities have not stated clearly how they create their subtitles, but most of their online lectures can be viewed on YouTube with subtitles that are quite accurate, but mostly verbatim.

1.5 The benefits of subtitles

The benefits of intralingual subtitles created by human beings in education and language learning have been proven in various studies (Bird & Williams, 2002; Danan, 2004; Garza, 1991; Markham, 1999; Moreno & Mayer, 2002; Perego et al., 2010; Vanderplank, 1988). A study by Garza (1991) found that the use of subtitles could bridge the gap between reading and listening comprehension, which facilitated language use in context. The findings by Garza (1991) provided evidence of the benefits of implementing subtitles in learning materials for L2 students during their early years of overseas study. Zhang and Mi (2010) found that, because speaking and listening skills are particularly problematic in L2 students, subtitles could indeed be beneficial in addressing some of these language issues. Likewise, Danan (2004) found that subtitles enhance the listening comprehension of non-native language learners and that they facilitate language learning through deepening cognitive processing.

Bird and Williams (2002) showed that exposure to subtitles increased word learning and word recognition, resulting in better comprehension. Similarly, Vanderplank (1988) indicated that the use of subtitles improved comprehension. Markham (1999) also demonstrated that the availability of subtitles improved word recognition in university-level English as second language (ESL) students. He further noted that students with an advanced level of second-language reading ability can use subtitles to develop their listening skills. Other studies, such as those by Moreno and Mayer (2002) and Vanderplank (1988), illustrated the benefits of subtitles in language learning.

The benefit of the presence of intralingual subtitles is found in the Dual Coding Theory: as information is repeated by means of different channels, readers and learners retain better information (Paivio, 1991). However, it has not been tested with automated subtitles.

1.6 Subtitles and cognitive load

The impact of subtitles on CL has been investigated in a few experiments (Diao, Chandler, & Sweller, 2007; Kalyuga, Chandler, & Sweller, 1999; Kruger, Hefer, & Matthew, 2013; Mayer, Heiser, & Lonn, 2001). These studies intended to find answers to whether reading subtitles in educational videos (Kruger et al., 2013; Mayer et al., 2001) and instructional design (Diao et al., 2007; Kalyuga et al., 1999) leads to an increase in CL.

Based on CLT, Diao et al. (2007) and Mayer et al. (2001) suggest that intralingual subtitles for ESL students and created by human beings might cause cognitive overload and thus decrease performance. Diao et al. (2007) stated that the redundancy effect occurs when learners have to coordinate mentally the same information presented simultaneously in different channels. This action causes learners to divide their attention between materials, which leads to an increase in extraneous CL and interferes with the learning of the information being presented (Kalyuga et al., 1999). Kalyuga and Sweller (2014) suggested that redundant information that is not needed for learning should be omitted to avoid negative learning outcomes because limited working memory is allocated to coordinating unnecessary information, thus decreasing the cognitive capacity for learning.

In contrast, Kruger et al. (2013) found no significant impact on cognitive load with or without subtitle exposure. In their study, students’ performance was not affected by the double exposure of auditory and written form of content and the redundancy effect did not occur. Kruger (2013) explained that the presence of the dual coding effect may free the simultaneous presentation of both auditory and visual information from the redundancy effect under certain conditions. He further argued that subtitles have a positive impact on language performance as an overall outcome if no overly complicated multimedia materials are presented at the same time (Kruger, 2013; 2016).

However, the findings of Kruger et al. (2013) showed a significant difference in CL between groups, with no subtitles producing a higher CL and higher frustration levels. They concluded that their results suggested that ESL students who learn through the medium of English using same-language subtitles could experience lower CL if they were to use subtitled video rather than unsubtitled video. In other words, their study showed that the presence of intralingual subtitles provided students with support that helps with their processing and understanding of the learning content.

Based on these previous studies, CL in intralingual subtitles for educational videos depends on redundancy and the dual coding effect. However, the effects of automated subtitles in the context of educational videos have not been studied.

1.7 The availability of subtitles

There are barriers to making subtitles available in audiovisual programs in spite of their benefits. The conventional way of subtitling is expensive and time-consuming. Professionals are needed in the process of creating quality subtitles, which is not always practical or financially viable. Even though professionally created subtitles are more accurate with better quality, this method has also become a major barrier and consideration to adding subtitles in every video. Automated subtitles could be the solution in this respect and could be widely used to reduce both cost and effort.

Automated subtitles are created by means of ASR software to provide a verbatim displayed transcript from spoken dialogues (Wald & Bain, 2008). The text transcription synchronizes with the video by the timing information so that online videos can be subtitled automatically (Díaz-Cintas, 2014; Wald, 2006; 2013; Wald & Bain, 2008). Automated subtitles are easy to access and they can be added to any education videos by just following simple steps of instructions, while expert skills are not necessary.

In addressing the challenge of subtitling (“captioning” is the original term used on the Google site) for users who upload their videos online, Ken Harrenstien, a deaf software engineer who led the captioning project for Google, combined Google ASR technology with the YouTube caption system to offer automatic captions, or auto-caps for short (Díaz-Cintas, 2014; Harrenstien, 2009). As Harrenstien (2009) explained in the official Google blog, auto-caps automatically generate subtitles for video by using the same voice-recognition algorithms also used in Google Voice. Google announced a 23% error rate on word recognition in 2013, and their system has been improved to reduce the error rate down to 8% in May 2015 (Shokouhi, Ozertem, & Craswell, 2016). In the 2017 Internet Trends Report, Meeker (2017, slide 48) reported that Google ASR has achieved an accuracy rate of 95% for English, which is the threshold of human accuracy. Even though Parton (2016) found that automatically generated subtitles are not accurate enough to be used exclusively for the deaf and hard-of-hearing population, auto-caps are still helpful in the way that average viewers can understand what was presented in the text (Gernsbacher, 2015).

In addition to auto-caps, Google launched automatic caption timing, or auto-timing, to make creating manual subtitles significantly easier for video owners by cueing the uttered words automatically in the video (Díaz-Cintas, 2014; Harrenstien, 2009). With auto-timing, only a simple text file needs to be created for the transcribed text, then the captions will be created automatically by using Google’s ASR technology and no special skills will be needed in the process.

By creating subtitles automatically, the barriers for video owners to adding subtitles will decrease considerably, as the time and resources for creating professional subtitle tracks will be reduced significantly. Automatic subtitling can potentially save a large number of resources, such as time and money, for many educational entities by producing large quantities of videos for online distance-learning programmes (Wald, 2013).

1.8 Goal of this study

The results of previous studies on CL in intralingual subtitles created by human beings showed that exposure to subtitles improved learning without increasing CL. However, there are no studies on the effects of automated subtitles in educational videos. Finding out whether automated subtitles are beneficial for educational purposes is important because it will increase the accessibility of the existing online education programmes with cost-effectiveness. In this article, we investigate the impact of automated subtitles and corrected subtitles on learning by comparing the effects of both subtitles through the measures of CL and performance.

2. Method

In order to address the issue of the impact of automated and corrected subtitles on learning and CL, it is essential to find out whether adding automatically generated or corrected subtitles to an educational video would improve learning; and, if it does, whether there are any differences in the impact of these subtitles on both learning and CL.

To answer these questions, this study used an experiment in an academic context where students (both English-speakers and ESL) were exposed to three versions of the same online lecture: one group saw a video with no subtitles, a second group saw the same video with automated subtitles and the third group saw the video with corrected subtitles. A pre-test and a post-test were carried out to determine the impact of the automated subtitles, corrected subtitles and unsubtitled video on learning. A CL test was conducted to determine how each type of subtitle affected CL.

Based on the literature, the hypotheses of the study are that automatically generated and corrected subtitles would result in better performance when compared to unsubtitled video; that automatically generated and corrected subtitles would result in lower CL when compared to unsubtitled video; and that corrected subtitles would outperform the other two test conditions.

2.1 Sampling

Using convenience sampling, we targeted first-year students from a bridging and a preparatory diploma at a university in Australia. This diploma course is a very structured and intensive programme,^{^[4]} with students completing two courses at a time in six-week terms. All the students in the Business and Economics programmes have to complete an introductory course on the Principles of Micro-Economics and more than 80%^{^[5]} of those students are from a non-English background.

There were seven groups, which ranged from 13 to 26 students per group. A total of 141 students participated in the study and each group was randomly assigned to one of the three video conditions – English video without subtitles (E), English video with English automatically generated subtitles (EA), and English video with English corrected subtitles (EC). Only 92 sets of data could be used in the study owing to either incomplete data or an invalid study procedure. The final division between the groups was 21 students in condition E, 34 students in condition EA, and 37 students in condition EC.

Seventy-four students were mainly from China, India, Vietnam, South Korea, Indonesia, Thailand, Bangladesh, Nepal and Pakistan, although there were also 15 local Australian students, one from Singapore, one from Ukraine and one from Sweden. However, owing to time constraints, the sample of participants in this study was not tested for its homogenous base on language proficiency.

2.2 Material

2.2.1 Video and subtitles

For the experiment, a 25-minute extract was selected from a 50-minute video on the topic of Elasticity of Supply and Demand downloaded from MIT OpenCourseWare (Gruber, 2011). The decision to use a shorter video was taken for practical reasons, since the time allowed for taking down the data was very limited, owing to the structured curriculum. The style of video used in this experiment is a classroom lecture: a lecturer was talking in front of a lecture room without the students being shown. The video was then presented to each of the three test groups in one of the three test conditions (EA, EC or E).

2.2.2 Subtitle characteristics and quality

The video downloaded directly from MIT OpencourseWare provided only the corrected version of the text transcript, which was uploaded and synced with the video. The transcript was possibly produced by volunteers because, according to the documentation, MIT provided professional transcription only after 2012 (Khesin, 2012). The automatically generated subtitles of the same video were available through YouTube.

Compared to the recommended average subtitling speed of 12 cps (Díaz-Cintas & Remael, 2007), the subtitle presentation speed of the corrected video in the current study was extremely high, with 20% of the corrected version being between 15 and 19 cps and 32% being faster than 20 cps, as shown in Table 2.1. That is significantly above the threshold for comfortable reading, with too many subtitles being at a speed that would not be possible to read fully.

Table 2.1 The analysis of the automated and corrected subtitles transcripts

	Automated subtitles	Corrected subtitles
Total subtitles	422	532
Total words	3 727	3 776
Total characters (with spaces)	19 960	21 194
Duration (seconds)	1 500	1 489
Average presentation speed (number of characters/duration)	13.3	14.2
Subtitles between 15 and 19 cps	98 (23%)	108 (20%)
Subtitles 20 cps and faster	44 (10%)	169 (32%)

2.2.3 Questionnaires

The participants were required to complete a few questionnaires in this experiment, including a biographical survey, a pre-test, a CL measurement and a post-test. A biographical survey about the participants’ background and language history was collected. A pre-test with ten multiple-choice (MC) questions on comprehension relating to the video content was employed (Appendix A). This was used as baseline information indicating the amount of prior knowledge the participants have before viewing. A CL measurement was used to determine the amount of self-perceived CL while viewing subtitled or unsubtitled video (Appendix B). A post-test with 30 MC items was used to measure the amount of content the students learned immediately after viewing (Appendix C).

All the questions were taken from existing courses. The ten-item pre-test consisted of questions relating to the 25-minute video content (e.g., Which of the following accurately characterize perfectly inelastic demand?). Five MCs (questions 1 to 5) of the pre-test were taken from the MIT OpenCourseWare (Gruber, 2011) website under the topic Elasticity of Supply and Demand, the video used in the study. It is a topic included in the course on micro-economic principles. These five questions were constructed to test key vocabulary terms and the students’ understanding of key concepts covered in the video. The other five MCs (questions 6 to 10) of the pre-test were drawn from the sample questions on the topic “elasticity” by Frasca (2007), the professor of Economics at the University of Dayton. The elasticity sample questions consisted of a total of 130 MC questions, but not all of them are related to the concepts covered in the video, so those selected were screened against the video content to ensure item relevance. The 30 post-test items included the ten items from the pre-test with the addition of an extra 20 items from the elasticity sample questions by Frasca (2007). Those extra 20 items were also selected according to content covered in the video used in the experiment.

The CL measurement is an adaptation from Leppink et al. (2014). Leppink et al. (2014) developed a more precise instrument for measuring CL, which better differentiates three types of CL: intrinsic load, extraneous load and germane load. The instrument is a 13-item self-evaluated report, with items 1 to 4 measuring intrinsic load (e.g., The content covered in the video was very complex), items 5 to 8 measuring extraneous load (e.g., The explanations and instructions in this video were very unclear) and items 9 to 13 measuring germane load (e.g., The video really enhanced my understanding of the content that was covered).

The 30-item MC questionnaire has an item reliability index of .986 and the 13-item CL test has an item reliability index of .840. Reliability was calculated by means of Cronbach’s alpha (Table 2.2a and b).

Table 2.2a The reliability of the 30-item performance test

Reliability statistics on performance test

Cronbach’s alpha

Based on standardized items

N of items

.986

.990

Table 2.2b The reliability of the 13-item effort scale

Reliability statistics on effort scale

Cronbach’s alpha

Based on standardized items

N of items

.840

.835

2.3 Design and procedure

This study is a quantitative experimental model that uses a three-group pre-test–post-test design (two test groups, one control group) with three subtitle conditions (EA, EC and E) as independent variables and CL and performance measurement as dependent variables. Each group of students was randomly assigned to one of the following conditions: English video with no subtitles (E), English video with English subtitles generated automatically (EA) and English video with corrected English subtitles (EC). The performance and CL of the participants were compared between the three tested conditions. The research questions can be answered by showing whether the intervention of subtitling has any difference in impact on performance and CL by comparing the two subtitled conditions and the control group.

The participants were asked to perform a pre-test before and a post-test after viewing the video. The pre-test was used as a baseline but also to measure the amount of prior knowledge the participants had on the topic of elasticity. The post-test was intended to determine the impact of subtitles, both EA and EC, on performance when compared to the pre-test. The scores between EA and EC were also compared in order to determine the difference in impact, if there was any. The CL measurement was conducted to determine the self-perceived effort in viewing with or without subtitles.

A biographical survey and a pre-test were conducted before the participants viewed the video. The video was about 25 minutes in length and was projected on a big screen for the participants to watch in one of the three conditions. All the conditions were identical for each of the groups, including good sound and image. Two tests, in the order of (1) a CL measurement and (2) a performance post-test, were given to the participants for completion immediately after the viewing.

2.4 Data analysis

A multiple regression analysis was performed with self-reported CL and performance on pre-test, post-test and full post-test scores as dependent variables, and using the months the students spent in an English-speaking country as a predicting variable. The analysis revealed the length of residence in an English-speaking country not to be a predictor of CL and/or performance.

The data were sufficiently normally distributed to allow for the use of an ANOVA. To determine whether there is any difference between the participants in each of the three conditions, a one-way ANOVA was performed with dependent variables: full-test performance, post-test performance, difference between pre- and post-test performance, intrinsic load, extraneous load or germane load. Post hoc Tukey tests and reliability tests were also carried out. A significance level of α=.05 was adopted for all the statistical analyses reported.

There were two significant characteristics of these two sets of subtitles that could have had an impact on the performance, namely the high presentation speed of the corrected subtitles and the low accuracy rate of the automatic subtitles. The transcripts of both versions of the subtitles were manually inspected and analysed along with their presentation speed. The details of the findings are shown in Table 2.1. It is evident from the analysis that the presentation speed of both versions is too high in many instances. Another important aspect in relation to the automated subtitles is the rate of errors as a result of incorrect speech recognition. In fact, there are 1 160 errors in the automated subtitles, including 209 omissions but excluding a few remarks or questions by students that were not subtitled at all in the automated version. These omissions were included in the corrected version. The errors are caused by misrecognition of the spoken words by the speech-recognition software. The word accuracy was 68.88%, calculated by applying the formula used by the US National Institute of Standards and Technology (Dumouchel et al., 2011).

3. Results

3.1 Performance

A one-way ANOVA on the pre-test results showed no significant difference between the three groups in terms of prior knowledge on the topic, as shown in Table 3.1.

Table 3.1 One-way ANOVA on the pre-test results

ANOVA
Pre-test percentage
	Sum of squares	df	Mean square	F	Sig.
Between groups	493.648	2	246.824	.798	.453
Within groups	27527.004	89	309.292
Total	28020.652	91

Figure 3.1 Full-test performance

Considering their performance on the full 30-item test, the group that saw the unsubtitled video (43.97%) did slightly better than the group which saw the automatic subtitles (42.84%) and the group that saw the corrected subtitles performed worst (38.29%; see Figure 3.1 and Table 3.2). However, these differences did not reach significance in a one-way ANOVA, as can be seen in Table 3.3.

Table 3.2 Distribution of the full-test scores

Full test per cent
	N	Mean	Std deviation	Std error
Unsubtitled	21	43.96809524	9.34690944	2.03966286
Automatic subtitles	34	42.84323529	13.68585771	2.34710523
Corrected subtitles	37	38.28891892	15.30797483	2.51661555
Total	92	41.26836957	13.63985364	1.42205304

Table 3.3 One-way ANOVA on the full-test results

ANOVA of performance
Full-test percentage
	Sum of squares	df	Mean square	F	Sig.
Between groups	565.839	2	282.920	1.539	.220
Within groups	16364.311	89	183.869
Total	16930.150	91
Within groups	31551.234	89	354.508
Total	32347.826	91

Considering the difference between the performance on the ten-item pre- and post-test, it appears that the group that saw the video with corrected subtitles did not improve and, in fact, did slightly worse than the other two, as can be seen in Figure 3.2 and Table 3.4; but the result is not statistically significant (p=.061; see Table 3.5).

Figure 3.2 Comparison of performance on the 10-item pre- and post-test scores

Table 3.4 Distribution of the difference between pre- and post- test scores

Difference between pre- and post-test
	N	Mean	Std deviation	Std error
Unsubtitled	21	9.52	21.325	4.654
Automatic subtitles	34	9.74	18.987	3.256
Corrected subtitles	37	–1.08	22.458	3.692
Total	92	5.33	21.404	2.232

Table 3.5 ANOVA results on difference between pre- and post-test

ANOVA of performance
Difference between pre- and post-test
	Sum of squares	df	Mean square	F	Sig.
Between groups	2541.164	2	1270.582	2.888	.061
Within groups	39149.054	9	439.877
Total	41690.217	1

The results indicate that the students performed worse in the post-test after watching the video with corrected subtitles and the students performed better in the post-test, with a similar performance level in both the unsubtitled and the automated subtitle conditions. Figure 3.3 shows that the data of the pre- and post-test difference is normally distributed. The post-test used here to find out the difference is only the subset of the total post-video questionnaire and it is the same 10 items that were used in the pre-test. A one-way between-subjects ANOVA was conducted to compare the effect of subtitles on performance in no subtitles, automated subtitles and corrected subtitles conditions, and also to determine whether the difference between the pre-test and the post-test is significant. The difference between the pre- and post-test in fact approached significance with F(2,89)=2.89, p=.061, as shown in Table 3.5.

Figure 3.3 The mean score for pre- and post-test difference is normally distributed

Post hoc comparisons using the Tukey HSD test indicated that the difference between automated and corrected subtitle conditions (p=.083) is closer to the significant level (p<.05) in comparison to the difference between the unsubtitled and the subtitled conditions, as shown in Table 3.6. However, the differences between unsubtitled and automated subtitle conditions (p=.999) and between unsubtitled and corrected subtitle conditions (p=.159) are not significant. A Tukey test confirmed that the difference lies not between the unsubtitled condition and the subtitled condition but between the two subtitled conditions.

Table 3.6 Tukey HSD test multiple comparisons

Dependent variable: Difference between pre- and post-test

					95% confidence interval
(I) Group	(J) Group	Mean difference (I-J)	Std error	Sig.	Lower bound	Upper bound
Unsubtitled	Automatic subtitles	–.182	.821	.999	–14.06	13.69
Unsubtitled	Corrected subtitles	10.605	.730	.159	–3.05	24.26
Automatic subtitles	Unsubtitled	.182	.821	.999	–13.69	14.06
Automatic subtitles	Corrected subtitles	10.787	.983	.083	–1.09	22.66
Corrected subtitles	Unsubtitled	–10.605	.730	.159	–24.26	3.05
Corrected subtitles	Automatic subtitles	–10.787	.983	.083	–22.66	1.09

3.2 Cognitive Load

Considering the CL induced by the different modes, there was a minor, insignificant difference (Table 3.8), as can be seen in Figure 3.4 and Table 3.7.

Figure 3.4 Comparison of the three types of CL across the three conditions

Table 3.7 The mean score of the CL measurement

		N	Mean	Std deviation	Std error
Intrinsic load	Unsubtitled	21	21.43	4.434	.967
	Automatic subtitles	34	20.26	5.089	.873
	Corrected subtitles	37	21.27	5.881	.967
	Total	92	20.93	5.260	.548
Extraneous load	Unsubtitled	21	16.24	6.252	1.364
	Automatic subtitles	34	15.41	6.150	1.055
	Corrected subtitles	37	15.70	6.004	.987
	Total	92	15.72	6.055	.631
Germane load	Unsubtitled	21	24.71	6.958	1.518
	Automatic subtites	34	25.26	4.857	.833
	Corrected subtitles	3	27.22	5.940	.977
	Total	92	25.92	5.860	.611

Table 3.8 A one-way ANOVA shows no significant effect of subtitles on CL, p>.05

ANOVA of CL

		Sum of squares	df	Mean square	F	Sig.
Intrinsic load	Between groups	24.551	2	12.275	.438	.647
	Within groups	2493.058	89	28.012
	Total	2517.609	91
Extraneous load	Between groups	8.878	2	4.439	.119	.888
	Within groups	3327.775	89	37.391
	Total	336.652	91
Germane load	Between groups	107.294	2	53.647	1.582	.211
	Within groups	3017.174	89	33.901
	Total	3124.467	91

A one-way between-subjects ANOVA was conducted to compare the impact of subtitles on the three types of CL in no subtitles, automated subtitles, and corrected subtitles conditions and also to determine whether the differences in intrinsic load, extraneous load and germane load are significant. The result shows no significant effect of subtitles on CL at the p<.05 level for the three conditions, as shown in Table 3.8.

4. Discussion

The main aim of the study was to determine whether the different quality of subtitles has an impact on performance and CL. The results in this study suggest that these subtitles (automated or corrected) have no significant effect on performance and CL, although the corrected subtitles seemed to have a negative impact when compared to the automatic subtitles. Different reasons explain these findings, which run counter to the body of literature that evidenced subtitles to be beneficial to comprehension (Bird & Williams, 2002; Moreno & Mayer, 2002; Vanderplank, 1988). It should be noted that the presentation speed of the automated and corrected subtitles is extremely fast in general, especially in the corrected subtitles, with more than one-third of them being faster than 20 cps. The data indicate that the students performed worse when watching a video with corrected subtitles, but the result is insignificant (p=.220; see Table 3.3). The lower mean score of the extraneous load indicates that subtitle presentation assists in information processing when compared to intrinsic load and germane load. The fact that the students put more effort into processing the content in the video with no subtitles condition gives a general indication that subtitles are still beneficial to providing a favourable learning environment, although none of these differences reached the level of significance. This makes it impossible to generalize from these findings. The results might also depend on how familiar these ESL students are with English and how they use subtitles in this context.

The results of this study do not show that adding automatically generated or corrected subtitles to an educational video improves learning. The reason why there was no significant difference could be ascribed to the fact that the error rate in the AE condition is substantially below respeaking standards. This would render the subtitles a distraction at best and probably resulted in the students’ ignoring the subtitles, something that will have to be verified with eye-tracking studies. The high presentation speed in the CE condition would make it virtually impossible for students to read around one-third of the subtitles, which would have been a serious distraction and could easily have interfered with comprehension. This seems to be supported by the findings.

The results do not support the three hypotheses of the study. The first hypothesis predicted that automatically generated and corrected subtitles would result in better performance when compared to unsubtitled video. However, our results showed that students who saw unsubtitled video (43.9%) did slightly better than those who saw either automated (42.84%) or corrected (38.29%) subtitles, even though the result is statistically insignificant.

The second hypothesis predicted that automatically generated and corrected subtitles would result in a lower CL when compared to unsubtitled video. However, our result showed that there is no significant effect of subtitles on CL for all the three test conditions. Furthermore, the trend of the data did show that both AE and CE conditions result in lower intrinsic and extraneous load and, as a result, this meant that the students would theoretically have more germane load available, even though they could not benefit from this in view of the problems with the two subtitled versions. Also, the lack of statistical significance means that this is at best a trend that will have to be established with refined experiments.

Considering hypothesis three, the students who saw the corrected subtitles did not outperform those who performed under the other two test conditions; instead, the trend of the results indicates that the students performed worst when viewing the corrected subtitles, though again the result is not statistically significant.

The trend of the current results shows that the presentation of subtitles helps with processing information with less mental effort, in line with the results of Kruger et al. (2013) that subtitles reduced the CL of students when processing learning materials. There are a number of explanations for the findings presented here. The major reason that this study does not have any significant results is that the presentation speed was so variable and so high in both automatically generated and corrected subtitles versions that it negated any possible benefits. The fact that 20 per cent of the subtitles in the corrected version are between 15 and 19 cps and 32 per cent of its subtitles are 20 cps and faster made the corrected subtitles impossible to read. Even though the corrected version is very accurate, since the subtitles are created verbatim by ASR, the number of cps displayed to the reader is far too high. As concluded in the study by Romero-Fresco (2016), fully verbatim subtitles are not desirable because high presentation speed is caused mainly by the high speech rate of the speakers in the video, and it remains far too fast to be read and comprehended, despite its high accuracy rate.

The automated version is slightly slower than the corrected version, but the fact that it contains a high number of errors (there are 1 168 errors out of a total of 3 727 words, including omissions) negated the possible benefits of subtitle reading. The accuracy rate of the automated subtitles is 68.88%, as calculated in the material section.

When the errors in the automated transcript are investigated closely, it appears that these are quite serious and were mainly caused by word misrecognition. The automated version has far too many errors to be useful. This is particularly true in an academic context, where the accuracy of information is critical. It can therefore be expected that students would be frustrated by the high error rate and would either be distracted by the errors or simply start ignoring the subtitles. The high error rate made the automated condition similar to the unsubtitled condition and the high presentation rate resulted in lower performance; this also made the corrected condition similar to the unsubtitled condition. As indicated by the post hoc Tukey test, the difference shown in this study lies between the two subtitled conditions; it does not lie between the unsubtitled condition and the two subtitled conditions. However, the corrected subtitles did result in a higher germane load, which shows promise for the mode.

5. Conclusion

This study was conducted in order to investigate whether automatically generated or corrected subtitles would have an impact on performance and CL. The results from previous studies show that the simultaneous presentation of both visual and auditory information actually assists learning because of the dual coding effect, as discussed in section 1.5. Further investigations have been done in order to understand how subtitle reading would influence information processing and CL.

As different methods have been used to evaluate CL, contradictory results have been found regarding whether subtitle reading would cause cognitive overload and thus decrease learning. These contradictions may be caused by different cognitive measurements and study procedures. However, despite the contradictory results, the benefits of subtitles have been consistently proven in the literature, supporting the view that subtitling is beneficial in promoting comprehension as a precursor to learning (Bird & Williams, 2002; Moreno & Mayer, 2002; Vanderplank, 1988).

The increased use of online video lectures by large numbers of educational entities has led to a huge demand for substantial quantities of subtitled video. Automatically generated subtitles could be a solution towards meeting the need to produce subtitles quickly and economically. Automated subtitles are created through speech-recognition software, but usually with a high error rate that could potentially affect subtitle reading. It is only logical to correct these errors in order to improve readability. Since there are hardly any empirical studies that compare different types and quality of subtitles, it was the goal of this study to fill the gap in this research area.

5.1 Findings and implications

This study was conducted to investigate automatic and corrected subtitles as they are one of the types of subtitle that are normally available online (Parton, 2016). However, these modes of subtitle do not seem to be beneficial under conditions of high error rates or high presentation speed. It would therefore seem that for subtitles to be beneficial, they would have both to be corrected and to be presented at a reasonable reading speed. The real challenge will be to find ways of increasing the accuracy of transcripts and, even more so, of reducing the text automatically to bring down the presentation speed, and this means that the powerful mode of subtitling currently remains out of reach for the majority of institutions.

There are two possible implications of subtitles being either too fast to process and comprehend or containing too many errors to facilitate comprehension. First, in order to have a reasonable presentation speed, this mode of subtitles can be rendered useful only after being edited, which task includes summarizing, reducing and reformulating. Professional subtitlers remain essential in the editing the transcripts of original texts if subtitling standards are to be met (Díaz-Cintas & Remael, 2007). This implies that professionals would still be involved, resulting in the problem of costliness; but the need to process large quantities of subtitles in a short time remains another challenge.

The second implication is that the audiovisual recording environment of the video would have to be controlled to the extent that the misrecognition by the speech-recognition software would be decreased. However, many variables are involved in controlling such an environment. Technology may advance to the point that it may be possible to have a highly sophisticated system capable of recognizing speech automatically, with high accuracy, and also capable of reducing and reformulating original transcripts simply. Future research on technology may make possible the production of automatic subtitles which are so accurate that readers will be able to process and comprehend them.

5.2 Limitations and further research

There are several limitations to this study. The current study used an availability sample from year one students and a larger sample size is essential in future research to yield possibly significant results. Given the nature of the available participants, we had limited control over their level of English, but are satisfied that they are comparable (see section 2.1). The fact that this study used a video of short duration (25 minutes), was based on a single viewing and had a nearly significant result implies that manipulating the experimental environment differently in future studies, such as using a video of longer duration for a longer period of time in a longitudinal study, may possibly have a better research outcome. In future research, corrected subtitles at lower presentation speed and over a longer period of time, such as a full school term, will be required, as the results would probably start manifesting only after more exposure and with students becoming used to the mode (Vanderplank, 1988). The videos used in this study featured only one topic, and further research should include a variety of different topics to ensure the generalizability of the results. Furthermore, eye-tracking studies should be conducted to gather more quantitative data and to determine to what extent the students actually try to read the subtitles.

References

Anantaram, C., Kopparapu, S. K., Patel, C., & Mittal, A. (2016). Repairing general-purpose ASR output to improve accuracy of spoken sentences in specific domains using artificial development approach. In S. Kambhampati (Ed.), Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence: Human-Aware Artificial Intelligence: Vol. 5. (pp. 4234–4235). Palo Alto, CA: AAAI.

Amara. (2010, November 22). Help MIT subtitle OpenCourseWare videos [Blog post]. Retrieved from https://about.amara.org/2010/11/22/help-mit-subtitle-opencourseware-videos/

Armstrong, A. W., Idriss, N. Z., & Kim, R. H. (2011). Effects of video-based, online education on behavioral and knowledge outcomes in sunscreen use: A randomized controlled trial. Patient Education and Counseling, 83(2), 273–277. doi:10.1016/j.pec.2010.04.033

Australian Government, Department of Education and Training. (2016). End of year summary international students enrolment data – Australia – 2015. Retrieved from https://internationaleducation.gov.au/research/International-Student-Data/Documents/Monthly%20summaries%20of%20international%20student%20enrolment%20data%202015/12_December_2015_MonthlySummary.pdf

Bird, S. A., & Williams, J. N. (2002). The effect of bimodal input on implicit and explicit memory: An investigation into the benefits of within-language subtitling. Applied Psycholinguistics, 23(4), 509–533. doi:10.1017/S0142716402004022

Coursera. (2014, April 27). Introducing coursera’s new global translator community [Blog post]. Retrieved from http://coursera.tumblr.com/post/84088014661/introducing-courseras-new-global-translator

Coursera. (n.d.). Video translation [Blog post]. Retrieved from https://learner.coursera.help/hc/en-us/articles/208279836-Video-translations

Danan, M. (2004). Captioning and subtitling: Undervalued language learning strategies. Meta, 49(1), 67–77. doi:10.7202/009021ar

de Jong, T. (2010). Cognitive load theory, educational research, and instructional design: Some food for thought. Instructional Science, 38(2), 105–134. doi:10.1007/s11251-009-9110-0

Debue, N., & van de Leemput, C. (2014). What does germane load mean? An empirical contribution to the cognitive load theory. Frontiers in Psychology, 5(1099), 1–12. doi:10.3389/fpsyg.2014.01099

Diao, Y., Chandler, P., & Sweller, J. (2007). The effect of written text on comprehension of spoken English as a foreign language. The American Journal of Psychology, 120(2), 237–261. doi:10.2307/20445397

Díaz-Cintas, J. (2014). Technological strides in subtitling. In S.-W. Chan (Ed.), The Routledge encyclopedia of translation technology (pp. 632–643). London: Routledge.

Díaz-Cintas, J., & Remael, A. (2007). Audiovisual translation: Subtitling. Kinderhook, NY: St. Jerome.

Doherty, S., & Kruger, J.-L. (2018). The development of eye tracking in empirical research on subtitling and captioning. In T. Dwyer, C. Perkins, S. Redmond, & J. Sita (Eds.), Seeing into screens: Eye tracking and the moving image (pp. 46–64). New York, NY: Bloomsbury Academic.

Dumouchel, P., Boulianne, G., & Brousseau, J. (2011). Measures for quality of closed captioning. In A. Şerban, A. Matamala, & J.-M. Lavaur (Eds.), Audiovisual translation in close-up: Practical and theoretical approaches (pp. 161–172). Bern: Peter Lang.

d'Ydewalle, G., & De Bruycker, W. (2007). Eye movements of children and adults while reading television subtitles. European Psychologist, 12(3), 196–205. doi:10.1027/1016-9040.12.3.196

d'Ydewalle, G., Praet, C., Verfaillie, K., & Van Rensbergen, J. (1991). Watching subtitled television: Automatic reading behavior. Communication Research, 18(5), 650–666. doi:10.1177/009365091018005005

Frasca, R. R. (2007). Chapter 4 – Elasticity – Sample Questions. Retrieved from http://academic.udayton.edu/PMIC/Instructors%20manual.htm

Garza, T. J. (1991). Evaluating the use of captioned video materials in advanced foreign language learning. Foreign Language Annals, 24(3), 239–258. doi:10.1111/j.1944-9720.1991.tb00469.x

Gernsbacher, M. A. (2015). Video captions benefit everyone. Policy Insights from the Behavioral and Brain Sciences, 2(1), 195–202. doi:10.1177/2372732215602130

Gruber, J. (2011). Principles of microeconomics: Elasticity of supply and demand [Video file]. Retrieved from http://ocw.mit.edu/courses/economics/14-01sc-principles-of-microeconomics-fall-2011/unit-1-supply-and-demand/elasticity/

Harrenstien, K. (2009, November 19). Automatic captions in YouTube [Blog post]. Retrieved from https://googleblog.blogspot.com.au/2009/11/automatic-captions-in-youtube.html

Ivarsson, J., & Carroll, M. (1998). Subtitling. Simrishamn: TransEdit.

Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Prentice Hall.

Kalyuga, S., Chandler, P., & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology, 13(4), 351–371. doi:10.1002/(SICI)1099-0720(199908)13:4<351::AID-ACP589>3.0.CO;2-6

Kalyuga, S., & Sweller, J. (2014). The redundancy principle in multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (2nd ed., pp. 247–262). doi:10.1017/CBO9781139547369.013

Khan Academy. (n.d.). How are the English captions created? [Blog post]. Retrieved October 2, 2016, from https://khanacademy.zendesk.com/hc/en-us/articles/226430308-How-are-the-English-captions-created-

Khesin, T. (2012, October 2). MIT OpenCourseWare launches interactive transcripts and video search [Blog post]. Retrieved from http://www.3playmedia.com/2012/10/02/mit-opencourseware-launches-interactive-transcripts-video-search/

Kruger, J.-L. (2013). Subtitles in the classroom: Balancing the benefits of dual coding with the cost of increased cognitive load. Journal for Language Teaching, 47(1), 29−53. doi:10.4314/jlt.v47i1.2

Kruger, J.-L. (2016). Psycholinguistics and audiovisual translation. Target, 28(2), 276−287. doi:10.1075/target.28.2.08kru

Kruger, J.-L., & Doherty, S. (2016). Measuring cognitive load in the presence of educational video: Towards a multimodal methodology. Australasian Journal of Educational Technology, 32(6), 19–31. doi:10.14742/ajet.3084

Kruger, J.-L., & Steyn, F. (2013). Subtitles and eye tracking: Reading and performance. Reading Research Quarterly, 49(1), 105–120. doi:10.1002/rrq.59

Kruger, J.-L., Hefer, E., & Matthew, G. (2013). Measuring the impact of subtitles on cognitive load: Eye tracking and dynamic audiovisual texts. Proceedings of the 2013 Conference on Eye Tracking South Africa, ETSA 2013, 62–66. doi:10.1145/2509315.2509331

Kruger, J.-L., Hefer, E., & Matthew, G. (2014). Attention distribution and cognitive load in a subtitled academic lecture: L1 vs. L2. Journal of Eye Movement Research, 7(5), 1–15. doi:10.16910/jemr.7.5.4

Leppink, J., Paas, F., van der Vleuten, C. P. M., van Gog, T., & van Merriënboer, J. J. G. (2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45(4), 1058–1072. doi:10.3758/s13428-013-0334-1

Leppink, J., Paas, F., van Gog, T., van der Vleuten, C. P. M., & van Merriënboer, J. J. G. (2014). Effects of pairs of problems and examples on task performance and different types of cognitive load. Learning and Instruction, 30(2), 32–42. doi:10.1016/j.learninstruc.2013.12.001

Markham, P. (1999). Captioned videotapes and second-language listening word recognition. Foreign Language Annals, 32(3), 321–328. doi:10.1111/j.1944-9720.1999.tb01344.x

Mayer, R. E., Heiser, J., & Lonn, S. (2001). Cognitive constraints on multimedia learning: When presenting more material results in less understanding. Journal of Educational Psychology, 93(1), 187–198. doi:10.1037/0022-0663.93.1.187

Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1), 43–52. doi:10.1207/S15326985EP3801_6

Meeker, M. (2017). Internet trends 2017. Code Conference [PowerPoint slides]. Retrieved from https://www.kleinerperkins.com/perspectives/internet-trends-report-2017

Merkt, M., Weigand, S., Heier, A., & Schwan, S. (2011). Learning with videos vs. learning with print: The role of interactive features. Learning and Instruction, 21(6), 687–704. doi:10.1016/j.learninstruc.2011.03.004

Moreno, R., & Mayer, R. E. (2002). Verbal redundancy in multimedia learning: When reading helps listening. Journal of Educational Psychology, 94(1), 156–163. doi:10.1037/0022-0663.94.1.156

Paas, F., & Sweller, J. (2014). Implications of cognitive load theory for multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (2nd ed., pp. 27–42). New York, NY: Cambridge University Press. doi:10.1017/CBO9781139547369.004

Paas, F., Tuovinen, J. E., Tabbers, H., & van Gerven, P. W. M. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38(1), 63–71. doi:10.1207/s15326985ep3801_8

Paas, F., & van Merriënboer, J. J. G. (1994). Instructional control of cognitive load in the training of complex cognitive tasks. Educational Psychology Review, 6(4), 351–371. doi:10.1007/BF02213420

Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psychology, 45(3), 255–287. doi:10.1037/h0084295

Parton, B. S. (2016). Video captions for online courses: Do YouTube’s auto-generated captions meet deaf students’ needs? Journal of Open, Flexible and Distance Learning, 20(1), 8–18.

Perego, E., Del Missier, F., Porta, M., & Mosconi, M. (2010). The cognitive effectiveness of subtitle processing. Media Psychology, 13(3), 243–272. doi:10.1080/15213269.2010.502873

Rajendran, D. J., Duchowski, A. T., Orero, P., Martínez, J., & Romero-Fresco, P. (2013). Effects of text chunking on subtitling: A quantitative and qualitative examination. Perspectives: Studies in Translatology, 21(1), 5–21. doi:10.1080/0907676X.2012.722651

Romero-Fresco, P. (2016). Accessing communication: The quality of live subtitles in the UK. Language & Communication, 49(4), 56–69. doi:10.1016/j.langcom.2016.06.001

Romero-Fresco, P., & Martínez Pérez, J. (2015). Accuracy rate in live subtitling: The NER model. In R. Baños Piñero & J. Díaz-Cintas (Series Eds.), Palgrave Studies in Translating and Interpreting: Vol. 5. Audiovisual translation in a global context: Mapping an ever-changing landscape (pp. 28–50). London: Palgrave Macmillan.

Shokouhi, M., Ozertem, U., & Craswell, N. (2016). Did you say U2 or YouTube? Inferring implicit transcripts from voice search logs. In J. Bourdeau, J. A. Hendler, R. Nkambou, I. Horrocks, & B. Y. Zhao (Eds.), Proceedings of the 25th International Conference on World Wide Web, WWW 2016 (pp. 1215–1224). Geneva, Switzerland: International World Wide Web Conferences Steering Committee. doi:10.1145/2872427.2882994

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. doi:10.1016/0364-0213(88)90023-7

Sweller, J. (2003). Evolution of human cognitive architecture. Psychology of Learning and Motivation, 43(2), 215–266. doi:10.1016/S0079-7421(03)01015-6

Sweller, J. (2004). Instructional design consequences of an analogy between evolution by nature selection and human cognitive architecture. Instructional Science, 32, 9–31. doi:10.1023/B:TRUC.0000021808.72598.4d

Sweller, J. (2010). Element interactivity and intrinsic, extraneous, and germane cognitive load. Educational Psychology Review, 22(2), 123–138. doi:10.1007/s10648-010-9128-5

Sweller, J. (2011). Cognitive load theory. In J. P. Mestre & B. H. Ross (Series Eds.), The Psychology of Learning and Motivation: Vol. 55. Cognition in education (pp. 37–76). San Diego, CA: Academic Press. doi:10.1016/B978-0-12-387691-1.00002-8

Sweller, J., & Chandler, P. (1991). Evidence for cognitive load theory. Cognition and Instruction, 8(4), 351–362. doi:10.1207/s1532690xci0804_5

Sweller, J., & Sweller, S. (2006). Natural information processing systems. Evolutionary Psychology, 4(1), 434–458. doi:10.1177/147470490600400135

Sweller, J., van Merriënboer, J. J. G., & Paas, F. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251–296. doi:10.1023/A:1022193728205

Szarkowska, A., & Bogucka, L. (2019). Six-second rule revisited: An eye-tracking study on the impact of speech rate and language proficiency on subtitle reading. Translation, Cognition & Behavior, 2(1), 101–124. doi:10.1075/tcb.00022.sza

Szarkowska, A., & Gerber-Morón, O. (2018). Viewers can keep up with fast subtitles: Evidence from eye movements. PLOS ONE, 13(6), 1–30. doi:10.1371/journal.pone.0199331

Taylor, G. (2005). Perceived processing strategies of students watching captioned video. Foreign Language Annals, 38(3), 422–427. doi:10.1111/j.1944-9720.2005.tb02228.x

Vanderplank, R. (1988). The value of teletext sub-titles in language learning. English Language Teaching Journal, 42(4), 272–281. doi:10.1093/elt/42.4.272

Wald, M. (2006). Captioning for deaf and hard of hearing people by editing automatic speech recognition in real time. In K. Miesenberger, J. Klaus, W. L. Zagler, & A. I. Karshmer (Series Eds.), Lecture Notes in Computer Science: Vol. 4061. Computers helping people with special needs, 10th International Conference on Computers for Handicapped Persons, ICCHP 2006 (pp. 683–690). Berlin: Springer.

Wald, M. (2013). Concurrent collaborative captioning. Paper presented at the 2013 International Conference on Software Engineering Research and Practice. San Francisco, CA.

Wald, M., & Bain, K. (2008). Universal access to communication and learning: The role of automatic speech recognition. Universal Access in the Information Society, 6(4), 435–447. doi:10.1007/s10209-007-0093-9

Wilson, E. A. H., Park, D. C., Curtis, L. M., Cameron, K. A., Clayman, M. L., Makoul, G., vom Eigen, K., & Wolf, M. S. (2010). Media and memory: The efficacy of video and print materials for promoting patient education about asthma. Patient Education and Counseling, 80(3), 393–398. doi:10.1016/j.pec.2010.07.011

Zhang, Y., & Mi, Y. (2010). Another look at the language difficulties of international students. Journal of Studies in International Education, 14(4), 371–388. doi:10.1177/1028315309336031

Appendix A

10-item multiple choice questions pre-test (Frasca, 2007; Gruber, 2011)

Elasticity Quiz

1) Which of the following accurately characterize perfectly inelastic demand?

a. The demand curve is vertical but does not change regardless of what happens to price.

b. The demand curve is vertical.

c. The demand curve is horizontal.

d. Demand does not change regardless of what happens to price.

2) When do we expect to see a perfectly elastic demand curve?

a. When a good has many complements.

b. When a good has a perfect substitute.

c. When a good has no substitutes.

d. When there is limited supply of a good.

3) Let’s say a researcher makes a study of patients in hospitals and finds they are much sicker than the average person in the population. Then he concludes that hospitals make patients sick. The researcher is mixing up two concepts; what are they?

a. Demand and supply.

b. Correlation and causation.

c. Theory and empirics.

d. Shifts along the demand curve and shifts in the demand curve.

4) If the elasticity of demand for a good is sufficiently negative, firms may actually lose revenues when they raise the price of the good. Why is this?

a. The elasticity of demand changes.

b. Consumers substitute from other goods to buy this firm’s good.

c. The supply curve shifts in.

d. Fewer people buy the good at the higher price, and so overall revenues are lower.

5) What is an example of a supply shock in the orange market that would enable us to estimate demand elasticity?

a. All of these.

b. A plant disease that hits orange crops.

c. A new government tax on orange growers.

d. An early frost in Florida that destroys crops.

6) When the quantity of coal supplied is measured in kilograms instead of pounds, the demand for coal becomes

a. more elastic.

b. neither more nor less elastic.

c. less elastic.

d. undefined.

7) An increase in subway fares in New York City will boost your expenditures on subway rides if

a. the supply of subway rides is elastic.

b. the supply of subway rides is inelastic.

c. your demand for subway rides is inelastic.

d. your demand for subway rides is elastic.

8) The demand for Honda Accords is

a. probably inelastic and less elastic than the demand for automobiles.

b. probably elastic but less elastic than the demand for automobiles.

c. probably elastic and more elastic than the demand for automobiles.

d. probably inelastic but more elastic than the demand for automobiles.

9) Which of the following is likely to have the smallest price elasticity of demand?

a. a new Ford automobile

b. a new automobile

c. a new Ford Mustang

d. an automobile

10) Demand is inelastic if

a. a leftward shift of the supply curve raises the total revenue.

b. the good in question has close substitutes.

c. the smaller angle between the vertical axis and the demand curve is less than 45 degrees.

d. large shifts of the supply curve lead to only small changes in price.

Appendix B

Cognitive load test adapted from Leppink et al. (2014)

Please respond to each of the questions by circling the most applicable number on the following scale 0–10 (0 = not at all the case and 10 = completely the case).

1) The content covered in the video was very complex.

Not at all the case

Completely the case

2) The video covered formulas that I perceived as very complex.

Not at all the case

Completely the case

3) The video covered concepts and definitions that I perceived as very complex.

Not at all the case

Completely the case

4) I invested a very high mental effort in the complexity of this video.

Not at all the case

Completely the case

5) The explanations in the video were very unclear.

Not at all the case

Completely the case

6) The explanations were full of unclear language.

Not at all the case

Completely the case

7) The explanations were, in terms of learning, very ineffective.

Not at all the case

Completely the case

8) I invested a very high mental effort in unclear and ineffective explanations in this video.

Not at all the case

Completely the case

9) The video really enhanced my understanding of the content that was covered.

Not at all the case

Completely the case

10) The video really enhanced my understanding of the formulas that were covered.

Not at all the case

Completely the case

11) The video really enhanced my knowledge of concepts and definitions that were covered.

Not at all the case

Completely the case

12) The video really enhanced my knowledge and understanding of the elasticity of supply and demand.

Not at all the case

Completely the case

13) I invested a very high mental effort during this video in enhancing my knowledge and understanding.

Not at all the case

Completely the case

Appendix C

30-item multiple choice questions post-test (Frasca, 2007; Gruber, 2011)

Elasticity Quiz after viewing

1. The price elasticity of demand depends on

a. the units used to measure price but not the units used to measure quantity.

b. the units used to measure price and the units used to measure quantity.

c. the units used to measure quantity but not the units used to measure price.

d. neither the units used to measure price nor the units used to measure quantity.

2. The demand for food is most elastic in countries

a. with low income levels.

b. that are highly urbanized.

c. with intermediate income levels.

d. with high income levels.

3. Demand is perfectly inelastic when

a. the good in question has perfect substitutes.

b. shifts in the supply curve results in no change in price.

c. shifts of the supply curve results in no change in quantity demanded.

d. shifts of the supply curve results in no change in the total revenue from sales.

4. What is an example of a supply shock in the orange market that would enable us to estimate demand elasticity?

a. All of these.

b. A plant disease that hits orange crops.

c. A new government tax on orange growers.

d. An early frost in Florida that destroys crops.

5. Which of the following accurately characterize perfectly inelastic demand?

a. The demand curve is vertical but does not change regardless of what happens to price.

b. The demand curve is vertical.

c. The demand curve is horizontal.

d. Demand does not change regardless of what happens to price.

6. The demand for Honda Accords is

a. probably inelastic and less elastic than the demand for automobiles.

b. probably elastic but less elastic than the demand for automobiles.

c. probably elastic and more elastic than the demand for automobiles.

d. probably inelastic but more elastic than the demand for automobiles.

7. If the elasticity of demand for a good is sufficiently negative, firms may actually lose revenues when they raise the price of the good. Why is this?

a. The elasticity of demand changes.

b. Consumers substitute from other goods to buy this firm’s good.

c. The supply curve shifts in.

d. Fewer people buy the good at the higher price, and so overall revenues are lower.

8. The slope of a demand curve depends on

a. the units used to measure quantity but not the units used to measure price.

b. the units used to measure price and the units used to measure quantity.

c. the units used to measure price but not the units used to measure quantity.

d. neither the units used to measure price nor the units used to measure quantity.

9. Producers' total revenue will decrease if

a. the price rises and demand is inelastic.

b. income increases and the good is a normal good.

c. the price rises and demand is elastic.

d. income falls and the good is an inferior good.

10. A good with a vertical demand curve has a demand with

a. infinite elasticity.

b. unit elasticity.

c. zero elasticity.

d. varying elasticity.

11. Which of the following is likely to have the smallest price elasticity of demand?

a. a new Ford automobile

b. a new automobile

c. a new Ford Mustang

d. an automobile

12. A good with a horizontal demand curve has a demand

a. with an income elasticity of demand of 0.

b. with a price elasticity of demand of infinity.

d. for which there are no substitute.

d. with a price elasticity of demand of 0.

Refer to the following diagram to answer 13–15.

13. The figure above illustrates a linear demand curve. If the price falls from $8 to $6,

a. the quantity demanded will increase by less than 20 percent.

b. total revenue will remain unchanged.

c. total revenue will increase.

d. total revenue will decrease.

14. The figure above illustrates a linear demand curve. In the range from $8 to $6,

a. the demand is unit elastic.

b. the demand is price inelastic.

c. the demand is price elastic.

d. more information is needed to determine if the demand is price elastic, unit elastic, or

inelastic.

15. The figure above illustrates a linear demand curve. If the price falls from $6 to $4,

a. total revenue will decrease.

b. total revenue will increase.

c. quantity demanded will increase by more than 100 percent.

d. total revenue will remain unchanged.

16. Let’s say a researcher makes a study of patients in hospitals and finds they are much sicker than the average person in the population. Then he concludes that hospitals make patients sick. The researcher is mixing up two concepts; what are they?

a. Demand and supply.

b. Correlation and causation.

c. Theory and empirics.

d. Shifts along the demand curve and shifts in the demand curve.

17. The more substitutes available for a product,

a. the larger is its income elasticity of demand.

b. the smaller is its income elasticity of demand.

c. the smaller is its price elasticity of demand.

d. the larger is its the price elasticity of demand.

18. Of the following, demand is likely to be the least elastic for

a. Toyota automobiles.

b. compact disc players.

c. Ford automobiles.

d. toothpicks.

19. Of the following, demand is likely to be the least elastic for

a. pink grapefruit.

b. iceberg lettuce.

c. insulin for diabetics.

d. diamonds.

20. If a rise in the price of good B increases the quantity demanded of good A,

a. B is a substitute for A, but A is a complement to B.

b. A is a substitute for B, but B is a complement to A.

c. A and B are complements.

d. A and B are substitutes.

21. Supply is elastic if

a. a 1 percent change in price causes a larger percentage change in quantity supplied.

b. the good in question is a normal good.

c. the slope of the supply curve is positive.

d. a 1 percent change in price causes a smaller percentage change in quantity supplied

22. When do we expect to see a perfectly elastic demand curve?

a. When a good has many complements.

b. When a good has a perfect substitute.

c. When a good has no substitutes.

d. When there is limited supply of a good.

23. If at a given moment, no matter what the price, producers cannot change the quantity supplied, the momentary supply

a. has infinite elasticity.

b. has unit elasticity.

c. does not exist.

d. has zero elasticity.

24. The elasticity of supply measures the sensitivity of

a. supply to changes in costs.

b. quantity supplied to a change in price.

c. price to changes in supply.

d. quantity supplied to quantity demanded

25. An increase in subway fares in New York City will boost your expenditures on subway rides if

a. the supply of subway rides is elastic.

b. the supply of subway rides is inelastic.

c. your demand for subway rides is inelastic.

d. your demand for subway rides is elastic.

26. The demand for a good is elastic if

a. a decrease in its price results in a decrease in total revenue.

b. the good is a necessity.

c. an increase in its price results in an increase in total revenue.

d. an increase in its price results in a decrease in total revenue.

27. If a price decrease results in your expenditure on a good decreasing, your demand must be

a. unit.

b. inelastic.

c. linear.

d. elastic.

28. The route from Dallas to Mexico City is served by more than one airline. The demand for tickets from American Airlines for that route is probably

a. elastic and more elastic than the demand for all tickets for that route.

b. inelastic and less elastic than the demand for all tickets for that route.

c. elastic but less elastic than the demand for all tickets for that route.

d. inelastic but more elastic than the demand for all tickets for that route.

29. When the quantity of coal supplied is measured in kilograms instead of pounds, the demand for coal becomes

a. more elastic.

b. neither more nor less elastic.

c. less elastic.

d. undefined.

30. Demand is inelastic if

a. a leftward shift of the supply curve raises the total revenue.

b. the good in question has close substitutes.

c. the smaller angle between the vertical axis and the demand curve is less than 45 degrees.

d. large shifts of the supply curve lead to only small changes in price.

[1] Website can be accessed at https://www.coursera.org/

[2] Website can be accessed at https://www.khanacademy.org/

[3] Website can be accessed at https://academicearth.org/

[4] A diploma programme at Macquarie University International College (MUIC), where students are required to complete two courses in six-week terms.

[5] This is based on the student ratio in this sampling population in particular.