Applying machine translation to Chinese–English subtitling: Constraints and challenges

Lu Tian

Guangdong University of Foreign Studies

https://orcid.org/0000-0002-2180-1961

Abstract

Focusing on the topic of applying machine translation (MT) to Chinese–English subtitling, the subject of the present study, this article first analyses the current literature and introduces the general constraints in Chinese–English subtitling from three perspectives: technical, cultural, and textual. Technical constraints are imposed by the limited time and space available for each subtitle; cultural constraints relate to the disparities in the beliefs, values, customs, behaviours, and artefacts of different cultural groups; and textual constraints predominantly manifest in the differences between the source and target languages and the segmented nature of subtitles. In order to respond to these constraints and to achieve conciseness, comprehensibility, and coherence in the translated subtitles, this study highlighted condensation, context, and coordination as the key strategies to adopt. However, these strategies pose considerable challenges for MT in Chinese–English subtitling. First, the common practice of full translation by most MT tools tends to work against making subtitles concise. Second, the lack of relevant contextual knowledge limits the ability of MT to generate appropriate translations. Third, the segmented display of subtitles makes it challenging for MT to capture and reflect the logic in the source text, on the one hand, and to produce coherent output in self-contained segments, on the other. To illustrate these constraints and challenges, this article provides examples of bilingual subtitles from American Factory (Bognar & Reichert, 2019), an Oscar Winner for Best Documentary Feature, and compares the official subtitles with those generated by three popular MT tools in Chinese–English translation. The study investigated the efficacy of MT subtitling and its potential to produce quality subtitles; and this article proposes possible solutions and suggestions for improving the quality of MT in subtitling.

Keywords: machine translation (MT), Chinese–English subtitling, translation quality assessment, audiovisual translation (AVT), transediting

1. Introduction

Globalization and the rapid advancement of mass media have accelerated the production and transmission of video content tailored to the needs of a diverse international audience. This has led to an exponential increase in demand for subtitling services, a topic which has garnered wide attention from both industry and academia (Pérez-González, 2014). Given such an overwhelming demand for audiovisual translation (AVT), bringing machine translation (MT) to the industry has been considered to have increased the potential to meet surging market needs (Chan, 2017; Díaz Cintas & Massidda, 2020). In fact, the topic of applying MT to AVT has aroused growing interest across academia, with focuses on the development and application of specific tools (Bogucki & Díaz Cintas, 2020; Georgakopoulou, 2019a), pre-processing and post-editing practices (Bouillon et al., 2018), the acceptability of MT output (Koglin et al., 2022), and many other related topics (Bogucki, 2016; Deng & Gambier, 2019; Petukhova et al., 2012; Turcato et al., 2000).

Subtitles are characterized by their tendency to condense information and employ colloquialisms in many instances. The way subtitles are displayed on screen, specifically their segmented nature, therefore results in physically fragmented text. Moreover, the correct comprehension of subtitles relies heavily on the context, necessitating the use of visual, auditory, and background cues. All of these pose significant difficulties when MT is applied to subtitling. Such challenges are especially noteworthy when dealing with language pairs from different language families, such as Chinese and English, precisely the language pair that is analysed in this study.

With increasing economic and cultural exchanges between China and the rest of the world, the demand for Chinese–English subtitling is constantly burgeoning (Gambier & Jin, 2018). Apart from the general obstacles to using MT in AVT, the integration of MT in Chinese–English subtitling faces further complexities. This can be attributed to the considerable differences between the two languages in aspects of information structure, textual features, and writing systems, in addition to the disparities between the two cultures. This study sought to investigate the constraints and challenges associated with employing MT in Chinese–English subtitling. The aim was to identify effective methods such as pre-editing and post-editing for generating high-quality subtitles with MT. To illustrate these difficulties and solutions, the article cites examples of bilingual subtitles from American Factory (Bognar & Reichert, 2019), the 2020 Oscar Winner of Best Documentary Feature, and evaluates the translations generated by three translation tools through comparative analyses. The article also raises the matter of the potential of applying ChatGPT, the latest language model of AI, to subtitling, along with suggestions for its enhanced efficacy.

2. Translation quality assessment

Translation quality assessment (TQA) has been a consistently critical area of concern in the field of translation studies. Traditionally, translation quality has been attached mainly to the relationship between a source text (ST) and a target text (TT) or, rather, to the degree of equivalence between the two. This opinion, however, has been challenged in the light of the evolving concepts of text and authorship in modern translation studies. The rapid advancement and widespread application of MT have sparked many theoretical and empirical discussions in both academic and industrial circles regarding the evaluation of MT quality. This trend is similarly evident in the assessment of subtitling quality, particularly in the light of the recent surge in AVT products worldwide.

2.1 MT quality assessment

A prevailing method for assessing the quality of MT involves the use of metrics. These metrics or models, such as the Bilingual Evaluation Understudy (BLEU) (Papineni et al., 2002) and Multidimensional Quality Metrics (MQM) (Lommel et al., 2014), are typically applied by identifying errors in translated texts. More specifically, errors are “counted, classified and weighted according to their severity” (Castilho et al., 2018, p. 14) either manually, automatically, or through a combination of both methods.

In MT quality assessment, error classification has received significant attention and generated considerable discussion. The renowned MQM hierarchy contains eight primary dimensions at the top-level, each of which is specified with further parameters. These eight top-level branches are accuracy, design, fluency, internationalization, local convention, style, terminology, and verity (Lommel, 2018, pp. 117–118). Although MQM contributes to the standardization of otherwise arbitrary metrics, it is also criticized for failing to provide “guidance on the interpretation of the results” (Lommel et al., 2014, p. 461).

The categorization of errors varies from one model to another. Some derive primarily from linguistic considerations (Comelles et al., 2017; Costa et al., 2015; Farrús et al., 2010), while others stem from translation techniques (Federico et al., 2014; Koponen, 2012). In practice, a reference translation is usually employed to enable evaluation (Popović, 2018, p. 130). There are additional methods of assessing MT quality, including the measurement of post-editing effort, which focuses specifically on “the human revision process” over the translation output (Rivera-Trigueros, 2022, p. 596).

2.2 Subtitling quality assessment

Pedersen’s (2017) FAR model is one of the most influential quality assessment measures for subtitling. The model proposes three primary dimensions: functional equivalence, acceptability, and readability, the first letters of which form the name of the model. Among the three parameters, “functional equivalence” weighs more heavily than the other two, as it concerns the semantic and stylistic errors which may “arguably [affect] the viewers’ comprehension and ability to follow the plot the most” (Pedersen, 2017, p. 224). Under “acceptability”, errors of grammar, spelling and idiomaticity are evaluated; and for “readability” considerations include segmentation and spotting, punctuation, and reading speed and line length.

Doherty and Kruger (2018) have pointed out that “[u]nlike traditional TQA models and metrics, assessment in AVT is largely based on prescriptive industry guidelines” (p. 182). Summarizing the commonalities found in the various reviews, reports, and guidelines, they discovered that the general principles that underpin good quality in AVT are accuracy, presentation, and timing. Although labelled with distinct terms, Doherty and Kruger’s trichotomy and Pedersen’s model share significant similarities. Both models underscore the functional and pragmatic equivalence between the source and the target subtitles, and they both place importance on the form of subtitles in achieving optimal readability.

Applying Sperber and Wilson’s Relevance Theory to subtitling, Bogucki (2022) has proposed that the rationale behind the process of decision-making in the practice of AVT can be explained by the interplay between relevance, communicative intention, and processing effort. Empirical research in this field focuses on the audience’s visual attention and cognitive load and also their reception of subtitles (Doherty & Kruger, 2018). The research methods employed include surveys, questionnaires, eye-tracking, and electroencephalography (EEG). Despite the differences in their specific research focuses and methodologies, all these studies consider cognitive effort to be a crucial factor when evaluating the quality of subtitles. They agree that subtitles which demand excessive cognitive effort on the part of viewers should be avoided unless they are considered necessary in the context and are therefore justified.

In recent years, there has been a growing interest in studies that centre on the performance and evaluation of MT in subtitling (Bogucki & Díaz Cintas, 2020; Brendel & Vela, 2022; Gambier, 2023). Koglin et al. (2022) applied Pedersen’s FAR model to assess the quality of subtitles that were machine-translated and then post-edited by human beings using a triangulation research approach. Specifically, they correlated the results of their analysis with empirical data collected from think-aloud protocols and open-ended questionnaires. Their findings revealed that the machine-translated and post-edited subtitles demonstrated a reasonable level of aptness regarding meaning and adherence to target language norms; however, the technical parameters of the subtitles hindered readability and displayed relatively lower quality.

3. Constraints in Chinese–English subtitling

Before delving into the challenges posed by implementing MT in subtitling, it is worth scrutinizing the norms and impediments that are typically encountered in subtitle translation, particularly in the case of Chinese–English subtitling.

Guardini (1998) classified subtitling constraints into three categories: technical, textual, and linguistic. The linguistic constraints were further divided into intra-linguistic and extra-linguistic constraints; whereas the former concerns mainly syntactical and grammatical differences between languages, the latter is related to cultural and contextual variations. In Guardini’s definition, textual constraints seem to be vague in scope and involve a wide range of elements, from the presence of the ST to the availability of visual components, from the reduction of the original to the change of medium (Guardini, 1998, pp. 99–101). In their study of Chinese subtitles, Du et al. (2013) examined the constraints in Chinese–English subtitling according to physical, linguistic, and cultural dimensions, which roughly correspond to the technical, intra-linguistic and extra-linguistic constraints in Guardini’s classification.

For the purposes of clearer delineation and discussion, this study preserves Guardini’s terminology of “technical” and “textual” constraints, yet it specifies “textual constraints” as those restrictions associated with textuality, emphasizing the textual distinctions between Chinese and English texts.[i] As cultural disparities exert a significant influence on subtitling, we also include “cultural” constraints as a category parallel to technical and textual constraints. Accordingly, this study considers the constraints on Chinese–English subtitling from the perspectives of technical, cultural, and textual constraints.

3.1 Technical constraints

Subtitling is technically constrained due to the transient nature of subtitles and the limited space available for them on a screen (Chen, 2019; Díaz Cintas & Remael, 2014, 2021). On the one hand, the display of subtitles must be synchronized with the utterances of the characters. In other words, subtitles should appear on the screen when a character starts talking and disappear when the person stops speaking. Since the synchronization for subtitling is not as strict as it is for dubbing, in some cases subtitles may stay on the screen a little longer after an utterance ends (Pedersen, 2010). On the other hand, the space available on the screen for subtitles is limited. Therefore, the norm is that no more than two subtitle lines should be presented at a time and the words included in each line are strictly limited (de Linde, 1995; Li & Bo, 2005). This is to ensure that the subtitles are reasonably readable and that the images are not blotted out.

In China, a common practice is to provide bilingual subtitles, with each line corresponding to one language. Following this format, each language is allotted only one line for subtitles within a given timeline. The widely accepted “six-second rule” suggests that “an average viewer can comfortably read in six seconds the text written on two full subtitle lines” (Díaz Cintas & Remael, 2021, p. 109) and the recommended maximum number of characters per line, including spaces and punctuation marks, is 42 for single-byte languages such as English and 16 for double-byte languages such as Chinese (Díaz Cintas & Remael, 2021, p. 99). These technical limitations are particularly obvious when translating Chinese into English. This is because Chinese characters feature their logographic writing system in which each character represents a word or a morpheme and occupies a consistent amount of space – typically that of two characters. In contrast, English primarily employs a phonographic writing system in which the majority of words consist of more than two letters. Consequently, English words tend to require a greater number of characters in comparison to their Chinese equivalents, which results in a greater consumption of space and therefore a greater need for condensation. The obligatory inflections in English (e.g., singular to plural forms; forms of verb tenses) also lead to the lengthening of words, whereas Chinese lacks such a requirement (Lian, 1993). Moreover, it is not uncommon to encounter Chinese sentences in which the subject is not explicitly stated and where the predicate may lack a visible verb (He, 2002). All these characteristics, combined with the absence of spaces between Chinese characters, result in relatively shorter Chinese sentences compared to their English equivalents.

3.2 Cultural constraints

Cultural constraints in translation have gained widespread recognition, especially with the introduction of “culture-specific items” (CSIs) in AVT. The translation of CSIs and the strategies employed in handling them have generated heated discussions in the field.

Considering it a form of intercultural manipulation, Franco Aixelá (1996) categorized the translation strategies of CSIs into two major groups: conservation and substitution. Similarly, Valdeón (2008) presented the translation strategies of CSIs as a dichotomy of preservation and substitution. In his investigation, Pedersen (2011) examined 14 different taxonomies and identified six strategies for translating extralinguistic cultural references in subtitling. These strategies were categorized on a scale that ranged from source-oriented to target-oriented approaches.

Regardless of the chosen strategy, subtitles are expected to have “high readability without requiring too much mental juggling” (Han, 2019, p. 18). A prime example is the translation of idioms, which are commonly employed in both spoken and written Chinese. In order to understand idioms fully, one must possess the relevant background knowledge, which often includes myths, stories, or historical events (Han, 2019, p. 19). Consequently, it is a significant challenge to represent the rich connotations of Chinese idioms in a comprehensible yet concise manner in translated English subtitles.

Another example is the translation of taboos. In spoken communication, people sometimes express their emotions using a profanity. However, the usage and acceptability of such language can vary between different cultures. For this reason, translators must assess the intended purpose of these expressions and employ effective strategies to prevent any unwarranted misunderstandings by audiences from different cultural backgrounds.

3.3 Textual constraints

Textually, subtitling is constrained by language differences, the segmented nature of subtitles, and the mixed mode of presenting information.

As with translation in other domains, language differences pose the most fundamental obstacles to subtitling. These differences can be found at various linguistic levels, including lexical, semantic, syntactical, and textual (Du et al., 2013; Guardini, 1998). For example, while English employs inflections, Chinese lacks such a grammatical requirement. Consequently, when translating from Chinese to English, one must always consider questions of number, tense, and aspect, as Chinese nouns do not inherently indicate singularity or plurality, nor do verbs indicate tense or aspect. Although these grammatical differences may seem trivial, they can prove to be formidable challenges for translators (see Zhu, 1999, 2022). A syntactical characteristic of Chinese, particularly in spoken Chinese, is the frequent omission of sentence subjects (Yip & Rimmington, 2016, p. 586): listeners can usually rely on the context to deduce the speaker’s intended referent. In contrast, sentence subjects are mandatory in English. Therefore, in these cases, translators must add the appropriate subjects to the English TT.

Restricted by technical limitations, subtitles are commonly divided into segments, which has an impact on the integrity and coherence of the text. Each segment must be self-contained, whether it be a clause or a well-defined part of a sentence (Han, 2019, p. 17). Ideally, the content conveyed by the translated segment should match that of the source segment; however, at times, adjusting the sentence structure appears to be necessary because of the different norms between Chinese and English.

Textual constraints also arise due to the mixed mode of information presentation in subtitling. Subtitles represent written forms of oral utterances, which means that the conventions of both oral and written modes need to be taken into account and adjustments to the text may therefore be necessary (Díaz Cintas, 2013, p. 278). In addition, audiovisual products consist of multiple modalities, and information that can be understood or inferred from visuals may not be conveyed explicitly in subtitles (Chen & Wang, 2019). In such cases, subtitlers must rely on contextual cues to ensure an accurate translation.

4. Methodology

This study is neither geared towards evaluating the overall efficacy of specific MT tools nor is it intended to generate a comprehensive report on the use of MT in the field of subtitling. Rather, the study focuses on a conceptual exploration of the limitations and difficulties associated with implementing MT in Chinese–English subtitling. Drawing on the existing literature on TQA in both MT and subtitling, we suggest that conciseness, comprehensibility, and coherence serve as the criteria for evaluating machine-generated subtitles in our study. With specific regard to Chinese–English subtitling, this study investigates the challenges of applying MT in this domain, particularly with respect to condensation, context, and coordination, which correspond to the three criteria of subtitle quality mentioned above.

Our investigation is illustrated with in-depth analyses of examples from the Oscar-winning documentary, American Factory (Bognar & Reichert, 2019). The Netflix documentary chronicles the experiences of a Chinese glass manufacturer as they establish a factory in Ohio and recruit local workers. Both Americans and Chinese are featured in the video, leading to the use of both English and Chinese interchangeably throughout. Thanks to the intralingual and interlingual subtitles, the documentary has reached a global audience and has even won a couple of international awards. The bilingual subtitle file in .srt format was downloaded in June 2022 from zimuku (Chinese pinyin for “subtitle inventory”), a crowdsourcing subtitle sharing platform for personal study and research (http://zmk.pw/). Out of the total of 1,383 subtitles, 489 are initially in Chinese. After excluding the isolated occurrences that were interspersed with English conversations, we included 481 Chinese subtitles as the subject of our study.

To test and compare the effectiveness of MT in Chinese–English subtitling, this study employed three popular MT tools: DeepL Translator (DL), Youdao Translation (YD), and ChatGPT. Developed by a German company, DL claims to be “the world’s most accurate and nuanced machine translation”, its performance in Chinese–English translation being more than five times more accurate than that of its competitors (https://www.deepl.com/en/whydeepl). YD has notably been developed in China, which could lead to an assumption of its strength in Chinese–English translation (https://ai.youdao.com/product-fanyi-text.s). Although ChatGPT is not specifically designed for translation, as a generative pre-trained language model it has aroused intense interest in its potential application in the field of subtitling (https://openai.com/blog/chatgpt). The MT-generated translations included in this article were derived from the free versions of DL and YD between July and October 2022. The translations provided by ChatGPT were generated from its free research preview based on GPT-3.5 in May 2023. For comparative analyses, the official subtitles (labelled ‘RT’) are taken as the reference translation.

In the examples illustrated in the next section, the back translations of the Chinese original subtitles are in italics. DL and YD are respectively translations generated by DeepL and Youdao. ChatGPT 1 is the initial translation by ChatGPT upon the prompt: “Translate the following subtitles from a documentary into English. Make them concise, comprehensible, and coherent.” ChatGPT 2 is the subsequent revised version, generated after further requirements were stipulated. RT is the official subtitle that is taken as a reference. The number in brackets at the end of each line indicates the character count.

5. Challenges of Chinese–English subtitling with MT

In this section, we investigate the performance of the three MT tools in scenarios that require strategies and considerations related to condensation, context, and coordination. These factors play a crucial role in generating concise, comprehensible, and coherent subtitles, making it a challenging task for MT.

5.1 Condensation for conciseness

Owing to technical constraints, text reduction or condensation is considered an essential strategy in subtitling (Han, 2019; Suratno & Wijaya, 2018). In order to provide accurate, comprehensible, and viewer-friendly subtitles within the character limit, translators must decide on the most pertinent information to include in the target subtitles and leave out the relatively superfluous, less relevant, or excessively cognition-demanding information. Such decisions are not easy for machines to make. In fact, subtitling is a form of transediting, where editing plays a significant role in the translation process. As MT always provides full translations, ensuring conciseness in subtitles can be challenging.

‎Example 1

[00:23:03,630 --> 00:23:09,420]

ST: 我从18岁成年之后第一个跨入的行业就是福耀 (41)

After I turned 18 years old and became an adult, the first industry I entered was Fuyao.

DL: Since I became an adult at the age of 18, the first industry I entered was Fuyao. (81)

YD: The first industry I entered when I was 18 years old was Fuyao (62)

ChatGPT 1: The first industry I entered after turning 18 was Fuyao. (56)

ChatGPT 2: I joined Fuyao right after turning 18. (38)

RT: Fuyao offered me my first job at 18. (36)

Currently, most MT tools face difficulties in making correct decisions regarding redundancy and errors in the ST when generating subtitles. Following the information flow of the ST, they tend to translate everything literally. For instance, the translation produced by DL in Example 1 follows the information structure of the ST strictly and appears to be unnecessarily wordy. YD successfully identified that the expressions 18岁 (18 years old) and 成年 (grow up) were repetitive in meaning. Therefore, it retained only the former, which led to a shorter version. In comparison, the RT employed an even briefer phrase, “at 18”. By restructuring the information, it presented a much more concise rendition. ChatGPT 1 resembles YD in word choice and sentence structure. After receiving the additional prompt, “make it more concise”, ChatGPT 2 was accordingly generated and the resulting text proved to be a significantly more natural and succinct version.

Example 2

[01:39:18,200 --> 01:39:24,750]

ST: 是不是把环境给破坏了呢这个原本很安宁的地方变得不安宁了 (56)

Has the environment been destroyed? This once peaceful place has become unsettled.

DL: Is the environment being destroyed? This once peaceful place has become unsettled. (82)

YD: Is it destroying the environment? This place, which used to be peaceful, has become unstable (92)

ChatGPT 1: Has the environment been destroyed? This once peaceful place has become restless. (81)

ChatGPT 2: Environment destroyed? Peaceful place now restless. (51)

RT: Have I taken the peace away and destroyed the environment? (58)

In Example 2, the speaker, who is the chairman of the factory, expressed uncertainty about their responsibility for causing harm to the environment and depriving the place of its peace by building the factory. The ST is a rhetorical question followed by further explanation. In order to make the English subtitle more concise, the RT combined the information conveyed by the three Chinese segments into one question. All three MT tools naturally replicated the structure of the ST, resulting in translations that span two sentences each and exceed the maximum character limit of 74 for six seconds. When instructed to produce a more concise translation, ChatGPT did offer a shorter version, but at the cost of a significant loss of comprehensibility. Compared to the introspective tone conveyed by the original text, the MT versions in general come across as more objective observations of the issue.

5.2 Context for comprehensibility

In the realm of audiovisual production, both the audio and the visual elements are vital to constructing a coherent and integrated multimodal text (Perego, 2009; Zabalbeascoa, 2008). Chaume (2004) points out that the various codes of a film synergize to create the overall meaning, with codes such as the musical, sound arrangement, iconographic, photographic, and special effects potentially varying across different cultures. In addition, a comprehensive understanding of the history, culture, social background, genre, theme, characters, and other relevant factors is often necessary for comprehending and interpreting subtitles fully (Lv, 2016). However, these contextual details are often not immediately accessible from the subtitle, which presents challenges to MT in incorporating these elements into their output. As a result, the accuracy and comprehensibility of MT might be compromised.

Example 3

[00:48:51,210 --> 00:48:52,380]

ST: 有成家 (6)

(I) have formed a family.

DL: Have a family (13)

YD: Have a family (13)

ChatGPT: Started a family. (17)

RT: I’m married. (12)

[00:48:52,670 --> 00:48:54,050]

ST: ‎孩子也在老家 (12)

Child/children is/are in the hometown.

DL: Children are also in the hometown (33)

YD: The children are at home, too (29)

ChatGPT: Children also in hometown. (26)

RT: My child is in my hometown. (27)

Example 3 is an excerpt from an interview with a female Chinese migrant worker who disclosed that she had been married and had left her child in her hometown. It can be found that the omission of subjects in Chinese sentences presents a challenge to MT tools in accurately restoring them in their English output. Unsurprisingly, none of the MT versions provides the agent of the action, leaving the utterance vague in meaning. In contrast, the RT not only supplements the sentence with an explicit subject “I”, but also effectively captures the connotative meaning of the Chinese phrase 成家 (form family) as the state of “being married”. Therefore, the RT is made clearer, more concise, and more comprehensible.

Chinese nouns do not indicate number (singular or plural). For instance, in the second subtitle in Example 3, 孩子 can refer to either one child or more children. Therefore, without sufficient background knowledge about the speaker or the sociocultural context, it is challenging to determine the number of children – which is grammatically necessary in English. Another linguistic difference exemplified in this example pertains to the modification of nouns. In Chinese discourse, possessive relations are not always explicitly stated and listeners are expected to deduce them contextually. In Example 3, it is not difficult to infer that the speaker is referring to her own story. The RT apparently sorted out this relationship and added “my” before “child” and “hometown”, while all the MT versions retained it in its vagueness.

Example 4

[01:13:56,720 --> 01:14:00,260]

ST: ‎实际上中国很多话都是对的人都是顺毛驴 (37)

In fact, a lot of Chinese sayings are correct. People are all smooth-hair donkeys.

DL: In fact, many Chinese sayings are true, people are obedient donkeys (67)

YD: In fact, many people in China are right (39)

ChatGPT: In fact, many things in China are correct. People are like obedient donkeys. (76)

RT: Donkeys like being touched in the direction their hair grows. (61)

In Example 4, the speaker cited a Chinese saying, 人都是顺毛驴 (People are all smooth-hair donkeys), to demonstrate that some Chinese wisdom also applies to the American context. In this Chinese idiom, people are compared to donkeys that enjoy being touched in the direction of their hair growth, so it implies that people naturally prefer to be encouraged or flattered rather than discouraged or scolded. Both DL and ChatGPT translated 顺毛驴 as “obedient donkeys”, which distorts the intended analogy. Furthermore, following the practice of full translation, both versions appear wordy. It is interesting to note that YD translated only the first half of the utterance and left the part with CSI untranslated. In contrast, the RT omitted the comparatively subordinate first half of the sentence to avoid a verbose subtitle. Following the “linguistic (non-cultural) translation” approach, the CSI-loaded latter half was rendered by “offering a target language version which can still be recognized as belonging to the cultural system of the source text” (Franco Aixelá, 1996, pp. 61–62). As a result, the RT effectively retains the metaphorical image of the original expression while increasing its comprehensibility through a more explanatory interpretation.

5.3 Coordination for coherence

Long sentences in subtitles are usually displayed in segments, which presents a major challenge to MT in producing coherent text. Moreover, colloquial language often contains repetitions, slips of the tongue, and illogical utterances, which further complicates the task. To overcome these challenges, these linguistic elements must be coordinated. For example, during the captioning phase, the intralingual translation of oral utterances into clear and coherent written subtitles can help to facilitate the process of interlingual subtitling with MT, contributing to smoother and more accurate subtitle generation.

Example 5

ST: [01:32:55,230 --> 01:32:58,440]

我们‎今年下半年我想再来两次就行了 (33)

We the second half of this year, I think two more (visits) will do.

[01:32:58,530 --> 01:33:01,700]

你们你们跟上进度计划我就不要 (30)

You, you keep up with the schedule, I don’t have to.

[01:33:02,240 --> 01:33:05,450]

不要让我辛苦了跑过来干嘛你说是不是 (36)

Don’t let me take pains. Ran over here for what?‎ Don’t you think so?

DL: We’ll be back later this year, and I’d like to come back twice. (63)

You guys, if you keep up with the schedule, I don’t want to (59)

Don’t make me work so hard to come over here, don’t you think so? (65)

YD: We’re ‎in the second half of this year I want to come back twice (64)

I don’t want you to keep up with the schedule (45)

‎don’t let I ran hard why ‎wouldn’t you say so (45)

ChatGPT: I only want to come two more times in the latter half of this year. (67)

I don’t need you to keep up with the progress plan. (51)

Don’t make me work hard. Why did I come all the way here? Isn’t that right? (75)

RT: I think I need to make only two more trips this year, (53)

if you achieve what you planned. (32)

Why do I need to come here? It’s not convenient for me. Right? (62)

In Example 5, the chairman of the factory intended to convey the idea that he would need to visit the factory site only twice more if everything went as planned. Unfortunately, the original utterance contained several corrections and ambiguities that affected the performance of MT. Both DL and YD mistakenly interpreted it as the speaker’s desire to visit the factory. In addition, their switch of subjects from “we” to “I” in the first line resulted in incoherent texts. Problems with accuracy and coherence are also found in other lines. In comparison, ChatGPT captured the tone of the speaker in the first sentence but fell victim to the same errors as the other two MT tools in the second.

In order to overcome these obstacles, effective pre-translation coordination may help to eliminate vagueness in the ST regarding its form, meaning, and logical coherence. Specific methods include avoiding repetition, clarifying meaning, and sorting out logic. Accordingly, we revised the ST and retrieved new MT versions, as demonstrated in Example 5’.

Example 5’

ST’: 今年下半年我想再来两次就行了 (30)

The second half of this year, I think, two more (visits) will do.

你们跟上进度计划 (16)

You keep up with the schedule,

我就不用辛苦跑过来你说是不是 (29)

I don’t have to ‎run all the way here. Don’t you think so?

DL: Later this year, I think two more visits will do. (49)

You guys keep up with the schedule (34)

I don’t have to come all the way here, don’t you think so? (58)

YD: I’d like to do two more later this year (39)

You keep up with the schedule (29)

I wouldn’t have to run all the way over here, would I (53)

ChatGPT: I only want to come two more times in the latter half of this year. (67)

If you keep up with the progress plan, (38)

I won’t have to make the exhausting trip here. Don’t you agree? (63)

After removing the barriers in the ST, all three MT tools produced much clearer and more logical and coherent translations. Notably, DL excelled in all three criteria — conciseness, comprehensibility, and coherence. Despite being the most concise, YD’s output is vague in the first sentence owing to the ambiguity of the phrase “to do two more”. ChatGPT’s translation of the first sentence also contains errors, as it mistakenly interprets it as the speaker’s reluctance to visit.

6. Discussion

Subtitle translation is an activity that largely involves transediting, because it often requires the condensation and adjustment of the text to meet the requirements for subtitles and suit the needs of the intended readership. For this reason, subtitling requires the translator to possess a high level of subjectivity to produce concise, comprehensible, and coherent subtitles that are properly segmented into self-contained blocks. In this connection, the feasibility of applying MT to subtitling depends on the extent to which MT can achieve such subjectivity.

Our study shows that with Chinese–English subtitling the current MT tools are unable to produce satisfactory output, particularly the requirement of conciseness. In addition to the cultural and textual constraints common to all types of translation, the segmented presentation of subtitles presents additional challenges for MT in ensuring accuracy and coherence. On the one hand, machines must accurately comprehend the ST and identify the core messages in the first place. Subtitles feature colloquialisms, that is, non-standard utterances, that can include fragmentation, repetition, redundancy, dreadful grammar, poor logic, and even slips of the tongue. Moreover, it is a norm that Chinese subtitles do not include punctuation marks but use spaces instead. This only intensifies the difficulty for MT of enabling accurate comprehension. On the other hand, machines must produce TT in a concise yet clear manner, sometimes even in self-contained and coherent segments. Since Chinese and English vary in their textual features and information structure, in Chinese–English subtitling the order of sentence elements may need to be revised, resulting in difficulties in matching the ST and the TT segments. On the whole, the performance of MT in subtitling is contingent upon the overall enhancements made to the quality of the products of MT. Subtitling, in particular, requires subjectivity on the part of the translator and therefore entails a complex task that MT has not yet fully mastered.

Despite these constraints and challenges, though, MT has demonstrated significant potential in enhancing the productivity and efficiency of subtitling. However, the involvement of human beings in pre-editing and post-editing remains indispensable. During the pre-editing phase, latent language barriers that can hinder comprehension, such as ambiguous and redundant expressions and grammatically unclear sentences, must be restructured or eliminated (Bouillon et al., 2018). Considering the conciseness criterion for subtitles, rewriting can sometimes be essential so as to ensure that the gist of a lengthy ST will be retained in the TT (Du et al., 2013, p. 255). Furthermore, cultural constraints can also be resolved during this process by integrating pertinent contextual information into the ST to enhance machine comprehension. In practice, pre-editing work can be combined with captioning. A condensed transcription can help enhance the performance of MT. In addition, post-editing, as with its application in other domains, offers an opportunity to correct errors in machine-generated output and to ensure that they meet the stipulated level of quality (Koglin et al., 2022, p. 3). At the same time, it is important to note that neural machine translation (NMT) models, such as Youdao Translation, may sometimes unjustifiably neglect seemingly intractable yet important information, and that ChatGPT has been criticized for providing inaccurate or even fake content, especially when it lacks the relevant information required for processing in its AI algorithm.

Nevertheless, ChatGPT has demonstrated considerable potential in subtitling, functioning as it does by providing responses to prompts that give instructions. Ideally, as long as the instructions are precise and effective, ChatGPT has the capacity to produce customized translations that meet specific requirements. These requirements could be specified in detail, including the character limit for each line, the genre and style of the output, and the preferred translation strategies – either source- or target-oriented. In addition, a prompt may include annotations that provide contextual information, relevant cultural knowledge, a glossary and terminology.

In fact, the idea of providing detailed instructions and annotations is not entirely new to translators and subtitlers. It can easily be related to the concepts of “translation brief” and “template files”, both of which have received significant attention and been the subject of extensive discussion. Strongly advocated by the functionalists of the German School, a translation brief is expected to contain information on the intended functions of the TT, the TT addressees, details of the time and place of text reception, the medium via which the text will be transmitted, and the motive behind the production or reception of the text (Nord, 2018, p. 57). These details are considered essential if translators are to employ appropriate translation strategies in order to produce satisfactory translations. In the sector of subtitling, a “template” is “a subtitle file containing a time-coded transcription of the dialogue, onscreen text, and sometimes also annotations for translators” (Oziemblewska & Szarkowska, 2022, p. 432). A template is therefore related to the production, standardization, and quality assurance of subtitles across multiple languages (Georgakopoulou, 2019b). For ChatGPT to be meaningfully and usefully integrated into subtitling, it is pivotal to provide adequate instructions and annotations in the prompt provided to it. While some instructions may pertain to the general criteria for producing high-quality subtitles, others may be tailored to specific instances. This highlights the utmost importance of proficient prompt-writing skills.

7. Conclusion

MT has become an integral part of a translator’s work mode and its application in subtitling promises a bright future. As MT technology continues to advance, it is anticipated that it will assume a more prominent role in subtitling and take on a greater share of the workload. However, despite the advancements being made in MT, human involvement in subtitling will probably never become dispensable, given the technical, cultural, and textual constraints of subtitling. In many cases, human translators or subtitlers, with their extensive cognitive processing skills, have to transedit and occasionally produce creative translations. Therefore, an ideal work mode is the cooperation between machines and human subtitlers via pre-editing and post-editing. The underlying principle of pre-editing is to eliminate ambiguity, clarify structure, and condense text to facilitate the processing of an ST. Post-editing should correct errors in machine-generated translations and ensure that the output meets the required quality standards.

Considering the enormous potential of ChatGPT’s integration into subtitling, special attention needs to be given to prompt-related matters, such as the components of prompts for subtitling tasks, prompt language and writing, and even the templates for prompts in subtitling. This article, reporting as it does on the study on the application of MT to Chinese–English subtitling, is primarily a conceptual discussion of the application of MT in this specific context. It has been illustrated by qualitative analyses of limited examples of Chinese–English MT. Future research is expected to conduct quantitative analyses of larger data and more comprehensive investigations of the acceptability and readability of machine-facilitated subtitles.

Acknowledgements

The research presented in this article was supported by the Humanities and Social Sciences Project entitled “An Eye Tracking Study on Reception of Poetic Elements in Practical Translation” (No. 23YJC740007) funded by the Ministry of Education, P. R. China. It was also supported by the Center for Translation Studies of Guangdong University of Foreign Studies. The author would like to express her gratitude to the editors and the anonymous reviewers for their constructive comments. Special thanks are also extended to the proofreader for their meticulous work on the article.

References

Bognar, S., & Reichert, J. (Directors). (2019). American Factory [Film]. Netflix.

Bogucki, Ł. (2016). Areas and methods of audiovisual translation research (2nd ed.). Peter Lang. https://doi.org/10.3726/b15723

Bogucki, Ł. (2022). Subtitling quality assessment from a relevance-theoretic perspective. Lodz Papers in Pragmatics, 18(1), 113–129. https://doi.org/10.1515/lpp-2022-0005

Bogucki, Ł., & Díaz Cintas, J. (2020). An excursus on audiovisual translation. In Ł. Bogucki & M. Deckert (Eds.), The Palgrave handbook of audiovisual translation and media accessibility (pp. 11–32). Palgrave Macmillan. https://doi.org/10.1007/978-3-030-42105-2_2

Bouillon, P., Gerlach, J., Gulati, A., Porro, V., & Seretan, V. (2018). The ACCEPT academic portal: A pre-editing and post-editing teaching platform. In G. C. Pastor & I. Durán-Muñoz (Eds.), Trends in e-tools and resources for translators and interpreters (pp. 177–202). Brill Rodopi. https://doi.org/10.1163/9789004351790_010

Brendel, J., & Vela, M. (2022). Quality assessment of subtitles: Challenges and strategies. In P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), Text, speech, and dialogue (pp. 52–63). Springer. https://doi.org/10.1007/978-3-031-16270-1_5

Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. (2018). Approaches to human and machine translation quality assessment. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp. 9–38). Springer. https://doi.org/10.1007/978-3-319-91241-7_2

Chan, S. (2017). The future of translation technology: Towards a world without Babel. Routledge. https://doi.org/10.4324/9781315731865

Chaume, F. (2004). Film studies and translation studies: Two disciplines at stake in audiovisual translation. Meta, 49(1), 12–24. https://doi.org/10.7202/009016ar

Chen, Y. (2019). Translating film subtitles into Chinese: A multimodal study. Springer. https://doi.org/10.1007/978-981-13-6108-1

Chen, Y., & Wang, W. (2019). Semiotic analysis of viewers’ reception of Chinese subtitles: A relevance theory perspective. The Journal of Specialised Translation, 32, 194–216. https://jostrans.org/issue32/art_chen.pdf

Comelles, E., Arranz, V., & Castellón, I. (2017). Guiding automatic MT evaluation by means of linguistic features. Digital Scholarship in the Humanities, 32(4), 761–778. https://doi.org/10.1093/llc/fqw042

Costa, A., Ling, W., Luís, T., Correia, R., & Coheur, L. (2015). A linguistically motivated taxonomy for machine translation error analysis. Machine Translation, 29(2), 127–161. https://doi.org/10.1007/s10590-015-9169-0

de Beaugrande, R., & Dressler, W. (1981). Introduction to text linguistics. Longman. https://doi.org/10.4324/9781315835839

de Linde, Z. (1995). “Read My Lips” subtitling principles, practices, and problems. Perspectives: Studies in Translatology, 3(1), 9–20. https://doi.org/10.1080/0907676X.1995.9961245

Deng, P., & Gambier, Y. (2019). Audiovisual translation studies: Development and challenges—An interview with Professor Yves Gambier. Translation Horizons, 2, 6–13.

Díaz Cintas, J. (2013). Subtitling: Theory, practice and research. In C. Millán & F. Bartrina (Eds.), The Routledge handbook of translation studies (pp. 273–287). Routledge. https://doi.org/10.4324/9780203102893.ch20

Díaz Cintas, J., & Remael, A. (2014). Audiovisual translation: Subtitling. Routledge. https://doi.org/10.4324/9781315759678

Díaz Cintas, J., & Remael, A. (2021). Subtitling: Concepts and practices. Routledge. https://doi.org/10.4324/9781315674278

Díaz Cintas, J., & Massidda, S. (2020). Technological advances in audiovisual translation. In M. O’Hagan (Ed.), The Routledge handbook of translation and technology (pp. 255–270). Routledge. https://doi.org/10.4324/9781315311258-15

Doherty, S., & Kruger, J. (2018). Assessing quality in human and machine-generated subtitles and captions. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp. 179–197). Springer. https://doi.org/10.1007/978-3-319-91241-7_9

Du, Z., Li, Y., & Chen, G. (2013). Basic literacy in AV translation and research. Zhejing University Press.

Farrús, M., Costa-Jussà, M. R., Mariño, J. B., & Fonollosa, J. A. (2010). Linguistic-based evaluation criteria to identify statistical machine translation errors. In Proceedings of the 14th annual conference of the European Association for Machine Translation (EAMT 2010) (pp. 167–173). https://aclanthology.org/2010.eamt-1.12.pdf

Federico, M., Negri, M., Bentivogli, L., & Turchi, M. (2014). Assessing the impact of translation errors on machine translation quality with mixed-effects models. In Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP 2014) (pp. 1643–1653). https://doi.org/10.3115/v1/D14-1172

Franco Aixelá, J. F. (1996). Culture-specific items in translation. In R. Alvarez & M. Vidal (Eds.), Translation, power, subversion (pp. 52–78). Multilingual Matters. https://doi.org/10.21832/9781800417915-005

Gambier, Y. (2023). Audiovisual translation and multimodality: What future? Media and Intercultural Communication: A Multidisciplinary Journal, 1(1), 1–16. https://doi.org/10.22034/mic.2023.167451

Georgakopoulou, P. (2019a). Technologization of audiovisual translation. In L. Pérez-González (Ed.), The Routledge handbook of audiovisual translation (pp. 516–539). Routledge. https://doi.org/10.4324/9781315717166-32

Georgakopoulou, P. (2019b). Template files: The holy grail of subtitling. Journal of Audiovisual Translation, 2(2), 137–160. https://doi.org/10.47476/jat.v2i2.84

Guardini, P. (1998). Decision-making in subtitling. Perspectives: Studies in Translatology, 6(1), 91–112. https://doi.org/10.1080/0907676X.1998.9961326

Han, J. (2019). Constraints and challenges in subtitling Chinese films into English. Translation Horizons, 2, 14–27.

He, S. (2002). Contrastive studies of English and Chinese languages. Shanghai Foreign Language Education Press.

Gambier, Y., & Jin, H. (2018). Audiovisual translation in China: A dialogue between Yves Gambier and Haina Jin. Journal of Audiovisual Translation, 1(1), 26–39. https://doi.org/10.47476/jat.v1i1.42

Koglin, A., Pereira da Silveira, J. G., de Matos, M. A., Costa Silva, V. T., & Cândido Moura, W. H. (2022). Quality of post-edited interlingual subtitling: FAR model, translator’s assessment and audience reception. Cadernos de Tradução, 42(1), 1–26. https://doi.org/10.5007/2175-7968.2022.e82143

Koponen, M. (2012). Comparing human perceptions of post-editing effort with post-editing operations. In Proceedings of the seventh workshop on Statistical Machine Translation (pp. 181–190). https://aclanthology.org/W12-3123.pdf

Li, H., & Bo, Z. (2005). Norms in subtitle translation. Chinese Science & Technology Translators Journal, 18(2), 44–46. https://doi.org/10.3969/j.issn.1002-0489.2005.02.012

Lian, S. (1993). Contrastive studies of English and Chinese. Higher Education Press.

Lommel, A. (2018). Metrics for translation quality assessment: A case for standardising error typologies. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp. 109–127). Springer. https://doi.org/10.1007/978-3-319-91241-7_6

Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional quality metrics (MQM): A framework for declaring and describing translation quality metrics. Revista Tradumàtica, 12, 455–463. https://doi.org/10.5565/rev/tradumatica.77

Lv, J. (2016). The meaning-generating mechanism of subtitle translation under MCPT: The case analysis of Blood and Bone. Foreign Language and Literature, 32(6), 128–135. https://doi.org/10.3969/j.issn.1674-6414.2016.06.020

Neubert, A., & Shreve, G. (1992). Translation as text. The Kent State University Press.

Nord, C. (2018). Translating as a purposeful activity: Functionalist approaches explained (2nd ed.). Routledge. https://doi.org/10.4324/9781351189354

Oziemblewska, M., & Szarkowska, A. (2022). The quality of templates in subtitling: A survey on current market practices and changing subtitler competences. Perspectives, 30(3), 432–453. https://doi.org/10.1080/0907676X.2020.1791919

Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on Association for Computational Linguistics, Philadelphia (pp. 311–318). https://doi.org/10.3115/1073083.1073135

Pedersen, J. (2010). Audiovisual translation: In general and in Scandinavia. Perspectives: Studies in Translatology, 18(1), 1–22. https://doi.org/10.1080/09076760903442423

Pedersen, J. (2011). Subtitling norms for television. John Benjamins. https://doi.org/10.1075/btl.98

Pedersen, J. (2017). The FAR model: Assessing quality in interlingual subtitling. The Journal of Specialised Translation, 28, 210–229. https://www.jostrans.org/issue28/art_pedersen.pdf

Perego, E. (2009). The codification of nonverbal information in subtitled texts. In J. Díaz Cintas (Ed.), New trends in audiovisual translation (pp. 58–69). Multilingual Matters. https://doi.org/10.21832/9781847691552-006

Pérez-González, L. (2014). Audiovisual translation: Theories, methods and issues. Routledge. https://doi.org/10.4324/9781315762975

Petukhova, V., Agerri, R., Fishel, M., Georgakopoulou, Y., Penkale, S., del Pozo, A., Sepesy Maučec, M., Volk, M., & Way, A. (2012). SUMAT: Data collection and parallel corpus compilation for machine translation of subtitles. LREC 2012 Conference Proceedings, 21–28. https://doi.org/10.13140/2.1.1172.0961

Popović, M. (2018). Error classification and analysis for machine translation quality assessment. In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp. 129–158). Springer. https://doi.org/10.1007/978-3-319-91241-7_7

Rivera-Trigueros, I. (2022). Machine translation systems and quality assessment: A systematic review. Lang Resources & Evaluation, 56, 593–619. https://doi.org/10.1007/s10579-021-09537-5

Suratno, A., & Wijaya, D. C. (2018). Text reduction: Strategies adopted in audio visual subtitle translation. Proceedings of the International Conference on Language Phenomena in Multimodal Communication (KLUA 2018). Advances in Social Science, Education and Humanities Research (ASSEHR), 228, 205–213. https://doi.org/10.2991/klua-18.2018.30

Turcato, D., Popowich, F., McFetridge, P., Nicholson, D., & Toole, J. (2000). Pre-processing closed captions for machine translation. Proceedings of the 2000 NAACL-ANLP workshop on embedded machine translation systems, 5, 38–45. https://doi.org/10.3115/1117586.1117592

Valdeón, R. A. (2008). Alienation techniques in screen translation: The role of culture specifics in the reconstruction of target-culture discourse. Language in Contrast, 8(2), 208–234. https://doi.org/10.1075/lic.8.2.05val

Yip, P., & Rimmington, D. (2016). Chinese: A comprehensive grammar (2nd ed.). Routledge. https://doi.org/10.4324/9781315732930

Zabalbeascoa, P. (2008). The nature of the audiovisual text and its parameters. In J. Díaz Cintas (Ed.), The didactics of audiovisual translation (pp. 21–37). John Benjamins. https://doi.org/10.1075/btl.77.05zab

Zhu, C. (1999). Integration of form and content for communication through translation: With reference to pronouns in Chinese discourse. Multilingua: Journal of Cross-Cultural and Interlanguage Communication, 18(1), 69–88. https://doi.org/10.1515/mult.1999.18.1.69

Zhu, C. (2022). Fathoming translation as discursive experience: Theorization and application. Routledge. https://doi.org/10.4324/9780429443497