The LTA project: Bridging the gap between training and the profession in real-time intralingual subtitling

The LTA project:[1] Bridging the gap between training and the profession in real-time intralingual subtitling

Carlo Eugeni

SSML – Pisa, Italy

Rocío Bernabé Caro

SDI München – University of Applied Languages, Germany


Real-time intralingual subtitles enable access to live audiovisual products. However, the provision and the quality of such services across Europe is uneven and sometimes insufficient because live subtitlers are untrained or only partially trained and without recognized professional status. To bridge this gap, the EU-funded project Live Text Access (LTA) aims to create ad-hoc training materials and propose the recognition of certified professionals. This article first concentrates on the multifaceted and heterogeneous terminology adopted in the field. Then it provides an overview of the current situation in which live subtitlers are trained in Europe, focusing on the LTA rationale for creating open-source training materials based on certification, subtitling standards and a user-oriented approach. Finally, it reports on the progress the project has made in defining both the professional profile and the skills and competences of the intralingual real-time subtitler.

Keywords: real-time intralingual subtitling, live captioning, respeaking, velotyping, media accessibility

1. Introduction

In this article, we deal with text, accessible to an audience on a screen at a live event, as the more or less faithful transcription of what is said by speakers and produced in real time in the same language as the speech, though with an inevitable slight delay. We are aware of shorter and more straightforward expressions (e.g., live subtitling, respeaking, CART,[2] speech-to-text interpreting) but, unfortunately, there is not terminological univocity in this field around the globe. Moreover, this lack of univocal terms has led to restricted views of the profession and consequently of the training, which concentrates on some aspects only. Finally, every expression seems to limit the profession to a technique, a context, a target audience or some other scopes, whereas we think that all activities aimed at producing a real-time transcript of a speech share many similarities, at least in the process, and should then fall into the same category. That is why we have chosen here to use the expression “real-time intralingual subtitling” as an umbrella term to encompass all forms of written text activity involved in reproducing spoken discourse, either word for word or meaning for meaning. Although we know that real-time intralingual subtitling also provides a partial view of the profession and that a sectorial or academic expression would be ideal (e.g., real-time speech capturing[3], live diamesic translation[4], trans-pretation[5]), to us this expression has two advantages at present. First, it is already widely known and self-evident. Second, it limits the scope of the profession less than other expressions, especially if subtitling is considered in its broadest sense of text translating speech (forgetting about its traditional position at the bottom of the TV screen and as opposed to both translation, or text translating text, and interpreting, or speech translating speech) and as a synonym for “transcript” (without the limitation of word-for-word accuracy).

Concerning the profession, every day the number of real-time subtitlers increases, especially among freelancers who wish to expand their portfolio of services, but also among professionals employed by broadcasters and companies. However, because training is still lagging, many of them have received only partially-specialized training or no training at all (LTA, 2019). Moreover, the provision and quality of such services across Europe is uneven and sometimes insufficient (Utray, de Castro, Moreno, & Ruíz-Mezcua, 2012; EFHOH, 2015), and the professional status of real-time intralingual subtitlers remains unrecognized (EU Regulated Professions Database, 2019). To bridge these gaps, LTA is gathering together higher-education institutions, service providers, broadcasters, end-users and certifiers in order to create ad-hoc training materials for formal and informal education and to propose the recognition of certified professionals.

This article first presents an overview of the current state of the sometimes confusing, redundant and even misleading terminology used in the field of real-time intralingual subtitling. This will enable an understanding of the different perspectives adopted on this profession worldwide. Then, training endeavours in Europe have been assembled to promote an understanding of how all the possible perspectives are covered. Finally, the first results of the European project LTA are presented: an online survey about the necessary skills and competencies of the professional we call a real-time intralingual subtitler (IO1)[6]. These skills will then be converted into a curriculum (IO2). After that, training materials will be created (IO3) and tested (IO4), before they are certified (IO5) and made available online.

2. Terminology

In Translation Studies, the importance of names is stressed, especially when it comes to translating them, because a close natural equivalent is often hard to find (Nida, 1964). This means that names are much more than just a reference or a meaning. Moreover, terminological confusion may arise in specialized discourse, especially if a field is not consolidated and the related speech community is scattered. When this occurs, the same term may refer to different concepts or different terms may refer to the same concept. In the field of audiovisual translation (AVT), terminological confusion has always been evident. As Gambier (2006) shows, the names referring to it – “language transfer”, “multimedia translation”, “screen translation”, “multimodal translation”, “film translation”, “versioning”, “transadaptation”, etc. – reflect different perspectives and do not have the same meaning because they do not refer to the same thing.

Similarly, in the case of real-time intralingual subtitling, various terms seem to refer to the same profession. By expanding on what Romero-Fresco (2011) enumerates, several expressions denoting real-time intralingual subtitling have been identified: some are synonyms such as “live captioning”, a term used especially in the United States of America, or “live closed-captioning”, the expression used by the International Television Union; some others have a hierarchical relationship and refer to a more specific or general area of the profession (the process, the product or the profession), such as “respeaking” or “verbatim court reporting”. Terminology also varies from country to country, and even within the same country. If we exclude “live closed-captioning” – further subdivided into “direct speech recognition”, “direct typing method” and “re-speak speech recognition” – these variations depend on several variables which either partially or completely influence the perspective on the subject and consequently on the training. in this respect. five main variables have been identified: context, target text, production system, technique, other. These five variables are explained below.

2.1 Context

Depending on the context in which real-time intralingual subtitles are produced, the terminology used varies, which has direct consequences on the training itself. The most notable examples are the contexts of TV, conferences, courts and parliaments.

TV live subtitling, or TV live captioning, is a common example of context influencing terminology in the field of real-time intralingual subtitling. The initialism “TV” used in the expression reflects the reason why most of the training in real-time intralingual subtitling is done either by TV service providers or universities where TV subtitling is taught, namely Translation or Film-making faculties.

Conferences are another example of context influencing terminology in this field, which explains why Interpreting faculties often train real-time intralingual subtitlers. The terminology reflects such a perspective, using expressions such as “conference live access” or “speech-to-text interpreting”, where “interpreting” reflects the setting in which real-time intralingual subtitles – again, in the broadest sense of text translating speech and not of text appearing at the bottom of a screen – are mostly used: conferences.

Similarly, “live court/parliamentary reporting”, which implies the transcription of a source text produced in a parliament, a courtroom or a similar setting, mirrors the training that is done either in-house by service providers or in trade schools that specialized in this type of real-time intralingual subtitling.

2.2 Target text

The target text also influences the terminology and training. For instance, parliamentary or court reporting is traditionally not accessed simultaneously with an event. However, it is produced simultaneously with listening to the source text and the terminology reflects this change in expressions such as “live reporting”, “real-time reporting” or STTR (speech-to-text reporting, which, interestingly, is used as an alternative to STTI, or speech-to-text-interpreting). This phenomenon is mirrored in training, with reporting and subtitling usually being taught separately because they are considered to be two different professions. In particular, reporting is still a discipline with little or no theoretical framework and therefore it is hardly taught at college or university level. However, freelance live subtitlers and live subtitling agencies tend to accept assignments as live reporters or live producers of minutes (LTA, 2019), because the process is the same.

2.3 Production system

As De Seriis (2006) indicates, there are three systems with which to produce TV programmes: pre-recorded, live and semi-live. The first encompasses traditional programmes (films, series, cartoons and documentaries); the second focuses on breaking news, sports events, political debates; the third refers to programmes whose text is pre-prepared and read or recited live (e.g., the Academy Awards). As for subtitles, they may be either automatically or manually cued or produced in real time. For this reason, it is common to read about “live subtitles” instead of “real-time subtitles of live programmes” or “semi-live subtitles” instead of “real-time/manually-cued subtitles of semi-live programmes”. Such shortcuts match the production system of the product to be subtitled with the production system of the subtitles, generating confusion in the case of pre-recorded videos to be subtitled in real time, for example. Related to this is the production mode of the subtitles themselves, which can be produced physically in the same room where the event takes place, by telephone or in streaming mode. This explains why expressions such as “face-to-face live captioning”, “relay captioning” and “online subtitling” – also used as an antonym of “offline subtitling” and meaning pre-recorded subtitling – are also commonly used. In our understanding, this factor does not have a direct link to training but it keeps multiplying the number of expressions used in the field of real-time intralingual subtitling, thus contributing to confusion and restricted views on the matter.

2.4 Technique

Another important aspect influencing the description of real-time intralingual subtitles is the technique employed. As Romero-Fresco (2011) has signalled, “one of the consequences of the very little research carried out so far in respeaking is the lack of established terminology to refer not only to the professionals engaged in this discipline but also to the discipline itself” (p. 2).

Capitalizing on Romero-Fresco’s list and updating it, a long list of labels focusing on the technique can be found which also limit teaching to that specific technique:

·     speech-related terminology: for example, “respeaking”, “speech-based live subtitling”, “speech recognition-based subtitling”, “real-time subtitling via speech recognition”, “shadow speaking”, “speech captioning”, and even “speech capturing” or “re-speak speech recognition”;

·     voice-related terminology: for example, voice recognition (used as a synecdoche), (real-time) voice writing, revoicing, which is quite well established as an umbrella term to include audiovisual practices where a new voice replaces that of the original product (e.g., dubbing) or is added to the whole product (e.g., voice-over);

·     technology-related terminology: for example, ASR (automatic speech-recognition used as a synecdoche), CART (computer-assisted-real-time-transcription/translation), stenotyping, palantyping, velotyping and “direct speech recognition” or “direct typing method”.

In the cases mentioned above, the relation to teaching is all the more evident, as many vocational training institutes usually focus on the command of the technique. This is so influential that they call themselves “stenotyping school”, “academy of voice-writing” or “Velotype academy”, to name just a few examples.

2.5 Other

The terminology prevalent in the field is influenced by other factors, too. One of these is the editing, when a form of reduction of or adaptation to linguistic guidelines is expected, as in the case of “sensatim live subtitling”, “verbatim live subtitling” or the extreme “live editing”.

Similarly, the way subtitles appear on screen has led to the emergence of expressions such as “live closed-captioning” or “CC" (for both pre-recorded and real-time subtitles) in those countries where subtitles appear on teletext.

Language combination is another very interesting factor influencing the use of terminology in the field. If in American English “subtitling” applies only to the production of interlingual subtitles, with “captioning” referring to the production of intralingual subtitles, in British English, “subtitling” is used for both, given their similar form. In Europe, it is not uncommon to read about “real-time intralingual subtitling”, as in this article, or “real-time interlingual subtitling” to mean subtitles in a language different from that of the speaker.

Finally, also worth a mention is the focus on end-users. When intralingual subtitles started to be the focus of academic studies, authors talked of SDH (subtitling for the deaf and the hard-of-hearing). In what are conventionally called “dubbing countries”, this expression also means that subtitles are intralingual. Therefore, live or real-time SDH is used to mean intralingual real-time subtitles. Another expression which is based on end-users is “special-needs subtitling”, where also “special needs” covers products such as audio description.

All of these definitions mirror ways of considering the products which are reflected in teaching. Common, indeed, is the teaching of real-time intralingual subtitles in AVT summer schools, publicly-funded courses specifically designed for operators in the field of disability, and faculties of humanities, where the focus is more on linguistic manipulation than on the technique, as happens in trade schools.

3. Current teaching practices

The previous section has shown that terminology in the field of real-time intralingual subtitling is multifaceted and can be categorized based on several criteria, each affecting training. This leads to partial views of what can easily be considered as a wider profession, with direct consequences for training. With the labour market requiring more and more flexibility, ignoring the bigger picture in training could be limiting, as has always happened.

Real-time intralingual subtitles were first produced on TV using standard QWERTY keyboards (Lambourne, 2006), but then those were replaced by more speed-efficient stenographers (den Boer, 2001). Owing to a lack of professionals, many broadcasters have more recently opted to train their own professionals internally in respeaking, as is still the case today (Romero-Fresco, 2018).

The formal training of real-time intralingual subtitling started only in 2005, at the then SSLMIT (Scuola Superiore di Lingue Moderne per Interpreti e Traduttori) of the University of Bologna (Eugeni, 2008). Since then, other universities have tried to organize courses on live subtitling, especially through respeaking, but they did so only for a limited period of time. Currently, only a few European universities regularly offer training in respeaking: for instance (though not limited to), the University of Antwerp, which was the first to offer regular formal training in respeaking; the University of Leeds, which has provided introductory sessions on respeaking as part of their courses on AVT; the Universitat Autònoma de Barcelona, which offers a three-month online module and a one-month face-to-face module in Spanish as part of an MA in Audiovisual Translation; the University of Roehampton, which provides a three-month face-to-face module in English, Spanish, French, Italian and German; and the Universidade de Vigo, which offers a three-month online module on Intralingual Respeaking in English, Spanish and Galician and a three-month online module on Interlingual Respeaking in the same languages (Romero-Fresco, 2018). Worth a mention are also the School of Applied Linguistics of the Zurich University of Applied Sciences (Dutka & Szarkowska, 2016), the three-week online module on respeaking as part of the online Master of Audiovisual Translation (MTAV) of the University of Parma; the course on Audiovisual Translation in general, including respeaking, at the University of Mons, and the one-week face-to-face module on respeaking during the summer school in AVT of the University of Salento, in Lecce. In Germany, the SDI München offers a nine-month course which trains students in both respeaking and QWERTY typing. The course is practice-oriented and combines formal learning with short internships with partners in the industry.

All of these courses train students mainly to fulfil some of the criteria mentioned above, thus limiting the scope of the training. For example, most of the above-mentioned higher-education institutes focus on specific contexts such as TV or conferences. Moreover, they mainly concentrate on respeaking, which limits training to a technique and to the languages for which an ASR technology is available. In addition, the training materials that they use are language- and culture-specific, and they are not open source. Finally, the training is mainly limited to students who can afford the cost and time involved in attending a training course, since they might have to move to another city or country to be trained. It is also worth mentioning that the students in these faculties are trained without much contact with the real world. Indeed, they come to know the real world of work only once they decide to opt for a traineeship in real-time intralingual subtitling, are employed by a service provider or find clients as freelancers.

To conclude, training today is either too exclusive in terms of time, money or place; or too focused on a particular technique, language, application or context; or too generic. Furthermore, training materials and the way they are structured, even when they are part of well-established courses, normally depend on the knowledge and perspective of a single trainer and not on an international reference framework that could more easily bridge the many gaps that have been identified. Among these is certification. Although university students do obtain a diploma, this is not a certification of their real-time intralingual subtitling competence. And this affects the status of the profession, which is becoming increasingly widespread despite not yet enjoying international recognition.

4. LTA online survey on skills and competences

The LTA project approaches the abovementioned mismatch between trained and required skills in the marketplace through collaboration between educational and non-educational partners who represent the whole spectrum (trainers, employers, service providers, end-users and certifiers). The first step towards identifying the necessary skills was to launch an online survey. The results would deliver the basis for the design of a competence-based modular curriculum as a second step.

The following sections describe the identified competence areas and skills based on the data from the online survey. A total of 121 stakeholders from Europe, Asia-Pacific and the United States of America provided input: 57 professional subtitlers, 29 end-users, 13 trainers, 13 prospective trainees and 20 persons with the profile “Other”. The last category included, for instance, real-time subtitlers at museums, galleries or literary festivals, retired professionals, experts in the provision of technical assistance, and remote services. Because of space constraints, the data focus on the results concerning the six identified competence areas illustrated below. The presentation groups them into two different categories: (a) four general competence areas that pertain to the training of all professionals (Understanding Accessibility, Linguistic Competence, Entrepreneurial and Service Competence, and IT Competence) and (b) two technique-specific competence areas, namely respeaking and velotyping. The full survey is available via the project webpage.[7]

The questionnaire used two types of question: scale and free-text. Scale questions presented a set of skills drafted by LTA partners and organized in modules as presented at the 6th International Symposium on Live Subtitling and Accessibility, which took place in Milan, Italy, in 2018. The respondents evaluated their importance by delivering individual scores per skill and competence area. The highest possible score was 3 points, the lowest 0. All the competence areas obtained an average of 2 points. Then the skills were assessed individually. Out of 48 skills, 47 obtained an average of 1.5 points. Free-text answers made it possible to collect qualitative data. The results confirmed that all the competence areas are necessary and exhaustively cover the necessary skills and knowledge for the job. The following table provides an overview of the overall scores.


Table 1 Reported scores per competence area


Competence area




& service




Mean value
(all surveys)







4.1 Knowledge about accessibility

This competence area goes beyond factual information and academic knowledge, and focuses on the development of attitudes and knowledge regarding accessibility and on raising awareness towards the professional role as mediation. This competence area received a mean score of 2.1 points. Scale responses disclosed the need for acquiring knowledge in (a) basic concepts about accessibility, multimodality and universal design; (b) target groups and their needs, and (c) how to embed accessibility in the working environment.

Free-text responses disclosed two main issues. First, the need to define what the job and role mean and how the role of the real-time intralingual subtitler differs from other profiles, such as that of accessibility advisors at universities who are in charge of supporting and helping disabled students to overcome barriers and challenges in academic environments. Second, the need to define performance indicators to describe what acceptable quality is for each working context.

4.2 Linguistic competence

The respondents scored this competence area with an average of 2.6 points and, therefore, considered it to be the most important. The area encompasses three main categories: functionality, speech-related challenges and specific knowledge. Functionality refers to the ability to deliver the necessary quality in each setting and to improve the readability and legibility of the transcribed texts. The second focuses on the ability to apply exit strategies to cope with speech-related challenges in order to support understandability by, for instance, syntax simplification, reformulation or even by producing easy-to-understand subtitles. The third refers to strategies for acquiring and using the necessary specific terminology.

The skills with the highest score (2.8 points) were: to deliver the required readability (e.g., indicating the name of the speaker or a switch of speakers, specifying when someone speaks unclearly or too fast) and to deliver the accuracy in each setting (e.g., correct grammar, spelling of basic and difficult words and of names; correct use of job-specific terminology; correct description of sounds).

Free-text answers raised two issues concerning verbatim (word-for-word) and sensatim (meaning-for-meaning) subtitles. First, there is the question of whether they are two different modalities which require own training and are performed by two different professionals. Secondly, several respondents stressed the fact that delivering verbatim subtitles cannot automatically be considered as quality subtitles. Furthermore, some respondents identified the need for establishing performance indicators beyond speed rates, spelling accuracy or omissions. Overall, quality was defined in terms of minimum delay and the ability to transfer meaning by using higher-level strategies compared to word-for-word subtitling. Such strategies are similar to those applied in interpreting and include, for instance, generalizations, omissions or compression.

4.3 Entrepreneurship and service competence

This competence area concentrates on the entrepreneurial role of the professional as a service provider. The focus is on management skills to launch, run and develop a business, and on raising awareness about the role of personal and interpersonal skills.

In the survey, this competence area obtained an average of 2.3 points. In the scale responses, the interpersonal skills “Manage customers’ accessibility needs” obtained the highest score (2.8) followed by two skills with a total of 2.7 points: “Respond to a customer’s inquiry or problem in a timely and effective manner” and “Follow up on customer requests to ensure that accessibility service needs are met”.

Free-text responses revealed other aspects concerning personal and social skills. For instance, some respondents expressed the need to abide by a code of conduct, being appropriately dressed, avoiding patronizing behaviour, remaining objective and being able to cope constructively with criticism. Some respondents also outlined the importance of a confident demeanour. Data showed that acquiring competences in this area is especially relevant to freelancers. The fact that 75 per cent of the respondents (39 out of 57 professionals) currently work as freelancers indicates the need for training in these skills.

4.4 IT competence

The IT competence area comprises three main categories: setting up the working environment, input tools and output tools. In particular, the focus is on setting up hardware and software tools in the different working settings; identifying potential risks in real-time situations and reacting in a solution-oriented way. Moreover, these competences raise awareness about the need for carefully planning technical aspects, informing customers about technical requirements, scheduling enough time for setup and having ready-to-implement solutions.

In the online survey, this area obtained an average of 2.4 points. The skills with the highest scores, 2.7 and 2.6, belonged to the first category, that is, setup of the working environment. As for the free-text responses, some participants considered these skills as more relevant to freelancers since permanent employees have access to the software and can reach out to the company’s IT department for assistance. One participant stated the importance of this competence area as follows: “IT competence or better IT professionalism determines the higher value of the text interpreter.”

Free-text responses also revealed some related issues. For instance, some respondents mentioned the difficulty of keeping up with new developments, the difference in training-hours among some educational systems and the limitations of automatic interpreting. Finally, as stated by one respondent: “The ASR software must be accessible itself. This is the first step towards accessibility.”

4.5 Respeaking

This competence area pertains to the technique as a facet of a multi-sided profession as envisaged in LTA and sustained by the collected data. In these terms, to be able to subtitle in real-time is not synonymous with mastering a technique or software (e.g., ASR). Quite the contrary: it means knowing where, when, how and for whom to subtitle and possessing the necessary linguistic skills.

The 15 skills and four categories identified in this competence area mirror this holistic understanding of the profession: psycho-cognitive, metalinguistic, prosodic and interface interaction. The detailed descriptions of the categories and skills provided below should help with understanding the interrelations and interconnections between the general and the technique-specific competence areas outlined for the profession.

Psycho-cognitive skills include the ability to listen and speak simultaneously; to reformulate, edit, and correct the respoken text while listening; to remember full sentences while lagging; to activate linguistic exit strategies while respeaking, and to deal with slides, videos and other material used by speakers to produce coherent text.

Metalinguistic skills relate to turning non-verbal elements into verbal elements in general and for each LTA-trained working context; and to apply different techniques such as changing colours or font-size, or inserting labels. This set of skills also includes the ability to dictate punctuation while respeaking.

Prosodic skills such as speaking fluently, quickly and unambiguously are part and parcel of the respeaking competence. In particular, the focus is on techniques with which to command voice projection, articulation, pacing, breathing and modulation while dictating, and to improve endurance.

Finally, interface interaction comprises procedural skills divided into three categories:

·     pre-editing: how to train the software;

·     peri-editing (while respeaking): how to select terminology that the software can best process;

·     post-editing: how to spot a mistake, decide how relevant it is and correct it, if necessary.

The “reaching MARS” (most accurate and rapid speech-to-text rate) approach has been introduced to match the different skills. Trainees will be asked to push their articulatory skills to a speech rate, which is the most rapid possible while keeping a high level of accuracy. This means that the machine will continue to recognize the respeaker correctly.

As for the online survey, this area obtained an average of 2.3 points. The respondents rated four skills out of 15 as “Important”. Three of them obtained 2.4 points: one refers to the psycho-cognitive ability to deal with slides, videos and other material; one to the metalinguistic ability to dictate punctuation while keeping pace with the speaker; and one to the prosodic ability to respeak verbatim without making mistakes at an average speech rate. The skill with 2.3 points concerns the metalinguistic ability to implement non-verbal elements of speech.

The skill identified as being of the highest importance (2.9 points) was communicating with good pronunciation. Free-text responses also sustained the demand for good pronunciation. One participant explained that “[…] making ‘silly’ [to] sound like words [which cannot be understood because lack of coherence]” is extremely distracting and affects the viewing experience. The ability to reformulate, edit and correct the text and the ability to activate exit strategies gained 2.8 points. In the free-text responses several respondents specified that the type and scope of the reformulation and editing depend on the constraints of the context (e.g., TV subtitles, live), the speed of speech, the ability to apply exit strategies and the overall goal of avoiding content loss.

Concerning the speed rate, some participants expressed it in terms of words per minute (106 to 400) or of percentages (speed at which a 99 per cent spelling accuracy can be delivered), whereas others considered that speed rate could not be defined on its own but rather in co-dependency with parameters such as acceptable delay, context, and quality and type of output (verbatim or sensatim).

4.6 Velotype

Similarly to the area above, this competence area obtained 2.3 points and comprised 11 skills, which covered three categories: psycho-cognitive, typing skills and factors which enable high performance. Psycho-cognitive skills focus on the ability to listen and type simultaneously. On the one hand, they are analogous to those described in the competence area Respeaking. On the other, they depend on the idiosyncrasies of the technique (characters per minute) and of the channel (haptic).

As for typing, the peculiarity of this competence area is the efficient use of the Velotype keyboard in terms of accurate content and spelling and speed. Self-motivation, discipline and concentration skills compose the third area as personal skills and attitudes which enable high performance. Especially in the case of velotyping, where the training is normally longer and progress is less immediately evident, trainees need to acquire the ability to implement strategies or techniques to improve themselves in their daily work and practice in order to reach MARS through Velotype at all times and under all circumstances.

Responses emerging from the online survey provide insight into the stakeholders’ views. As for the scale responses, the respondents identified three skills as “Important” and eight as being of “High importance”. Two of the three skills perceived as “Important” (1.5 to 2.4) belonged to the category Factors of high performance and concerned the ability to discipline oneself to practise with the keyboard and to train one’s own concentration skills. The third skill concerned mastering the keyboard, that is typing at minimum speed while delivering typing accuracy and transferring meaning. As for the skills perceived as being of “High importance” (2.5 to 3.0), two obtained 2.8 points. The first belongs to the category psycho-cognitive skills and concerns remembering full sentences while lagging. The second is related to the category typing skills and refers to the ability to identify one’s own typing mistakes and to correct them, where necessary.

The above topics – namely, speed, accuracy, delay, together with interpreting strategies – also arose in the free-text responses. Several respondents suggested speed rates ranging from 420 to 500 characters per minute (cpm) and a 99–100 per cent spelling accuracy. According to some responses, both typing speed and spelling accuracy are subject to the overall objectives of reducing delay and applying strategies similar to those applied in interpreting which go beyond word-for-word subtitling.

The free-text responses also disclosed two new aspects: the need for establishing indicators to describe and measure the accuracy of the outputs and for embedding them in a suitable model; and the need to raise awareness about the suitability of a technique in contexts in which the noise of the keyboard may be disturbing, as pointed out by one respondent.

Finally, the question whether verbatim and sensatim outputs are two different modalities which require their own specific training and are to be performed by different professionals should be investigated further, especially in connection with each technique.

5. Conclusions

This article raises awareness about how partial views about and ways of understanding a profession and a professional profile may affect its development through training. Aware of this challenge, LTA has approached the identification of the necessary skills and competences by collecting input from stakeholders and as many countries as possible. The results show that the respondents agree on the competence areas that build the professional profile of real-time intralingual subtitlers in general and respeakers and Velotypists specifically.

Furthermore, this broad approach to identifying competences paves the way to the development of a flexible course structure which can accommodate modular content and can be taught in formal and informal settings such as universities, trade schools and training institutions, on the one hand, and by companies, broadcasters and service providers, on the other. The planned modules will lend themselves to use as stand-alone materials and, therefore, they can be taught either separately or as a whole. The differentiation between common and specific (to respeaking or velotyping) competence areas also enables the development of teaching materials specific to a single competence area, context (cultural events, parliamentary assemblies, broadcasts, workplace and education) and work setting (face-to-face, online and relay). In doing so, teaching will no longer be subordinated to a particular definition of the profession but to acquiring specific competences that can be applied to as many contexts, working settings and, consequently, products as possible, in line with market requirements.


De Seriis, L. (2006). Il servizio sottotitoli RAI: Televideo per i non udenti. In C. Eugeni & G. Mack (Eds.), inTRAlinea special issue: Respeaking. Retrieved from​/specials​/article/‌‌1687

den Boer, C. (2001). Live interlingual subtitling. In Y. Gambier & H. Gottlieb (Eds.), (Multi)media translation: Concepts, practices, and research (pp. 167–172). Amsterdam: John Benjamins. doi:​10.​1075/btl.34.20boe

Dutka, Ł., & Szarkowska, A. (2016, March). Respeaking as a part of translation and interpreting curriculum? Paper presented at the CTER conference, Cracow. Retrieved from http://avt.ils.uw.‌‌‌files/‌2016/03/‌Respeaking-as-a-part-of-translation-and-interpreting-curriculum.pdf

EFHOH. (2015). State of subtitling access in EU: 2015 Report. Retrieved from

Eugeni, C. (2008). La sottotitolazione in diretta TV: Analisi strategica del rispekearaggio verbatim di BBC News (Unpublished doctoral dissertation). Retrieved from​3271/1/Carlo_Eugeni.pdf

EU Regulated Professions Database. (2019). The EU single market: Regulated professions database. Retrieved from

Gambier, Y. (2006). Multimodality and audiovisual translation. In M. Carroll, H. Gerzymisch-Arbogast, & S. Nauert (Eds.), EU-high-level scientific conference series: MuTra 2006 – Audiovisual translation scenarios: Conference proceedings (pp. 91–98). Copenhagen: MuTra.

Gottlieb, H. (2003). Parameters of translation. Perspectives: Studies in Translatology, 11(3), pp. 167–187.

Lambourne, A. (2006). Subtitle respeaking: A new skill for a new age. In C. Eugeni & G. Mack (Eds.), inTRAlinea special issue: Respeaking. Retrieved from​/specials​/article/‌Subtitle_respeaking

LTA. (2019). LiveTextAccess: Online survey on skills and competences. Retrieved from http://ltaproject.​eu/

Nida, E. (1964). Principles of correspondence. In L. Venuti (Ed.), The translation studies reader (pp. 126–140). London: Routledge.

Romero-Fresco, P. (2011). Subtitling through speech recognition: Respeaking. Manchester: St Jerome.

Romero-Fresco, P. (2018). Respeaking: Subtitling through speech recognition. In L. Pérez-González (Ed.), The Routledge handbook of audiovisual translation (pp. 96–113). London: Routledge.

Safar, H. (2019, April 4-5). Hyper-numérisation, tradprétation et études interlinguistiques [Keynote speech]. 11th Professional Communication and Translation Studies International Conference, Timisoara, Romania.

Utray, F., de Castro, M., Moreno, L., & Ruíz-Mezcua, B. (2012). Monitoring accessibility services in digital television. International Journal of Digital Multimedia Broadcasting, 2012. doi:10.1155/‌2012/294219



[1]      LTA (Reference Number: 2018-1-DE01-KA203-004218) is a project co-funded by the ERASMUS+ programme of the EU. It addresses inclusion and innovation in higher education with a focus on the production of online and open materials for the training of the real-time intralingual subtitler. This article is part of the project-dissemination activities required by the Erasmus+ programme.

[2]      Computer-assisted-real-time-transcription/translation.

[3]    Cf. (last accessed 17 December 2019)

[4]    Cf. Gottlieb, 2003

[5]      Cf. Safar, 2019

[6]      IO: Abbreviated form of Intellectual Output. This term is used in EU projects to refer to work   packages.

[7]      The full survey is available via the project webpage: (last accessed 30 September 2019).