Acceptability of machine-translated content: A multi-language evaluation by translators and end-users

Sheila Castilho, Sharon O'Brien


As machine translation (MT) continues to be used increasingly in the translation industry, there is a corresponding increase in the need to understand MT quality and, in particular, its impact on end-users. To date, little work has been carried out to investigate the acceptability of MT output among end-users and, ultimately, how acceptable they find it. This article reports on research conducted to address that gap. End-users of instructional content machine-translated from English into German, Simplified Chinese and Japanese were engaged in a usability experiment. Part of this experiment involved giving feedback on the acceptability of raw machine-translated content and lightly post-edited (PE) versions of the same content. In addition, a quality review was carried out in collaboration with an industry partner and experienced translation quality reviewers. The translation quality-assessment (TQA) results from translators reflect the usability and satisfaction results by end-users insofar as the implementation of light PE both increased the usability and acceptability of the PE instructions and led to satisfaction being reported. Nonetheless, the raw MT content also received good scores, especially for terminology, country standards and spelling.


acceptability; usability; Machine translation human evaluation; translation quality assessment; end-user evaluation; machine translation; post-editing

Full Text:



Beaugrande, R., & Dressler, W. (1981). Introduction to text linguistics. New York: Longman.

Bernard, H. R. (2011). Research methods in anthropology: Qualitative and quantitative approaches. Plymouth, UK: Atlamira Press.

Byrne, J. (2006). Technical translation: Usability strategies for translating technical documentation. Dordrecht: Springer.

Byrne, J. (2014). Scientific and technical translation explained: A nuts and bolts guide for beginners. Abingdon: Routledge.

Carl, M., Gutermuth, S., & Hansen-Schirra, S. (2015). Post-editing machine translation: A usability test for professional translation settings. In A. Ferreira & J. W. Schwieter (Eds.), Psycholinguistic and cognitive inquiries into translation and interpreting (pp. 145–174). Amsterdam: John Benjamins.

Castilho, S., O’Brien, S., Alves, F., & O’Brien, M. (2014). Does post-editing increase usability?: A study with Brazilian Portuguese as target language. Proceedings of the Seventeenth Annual Conference of the European Association for Machine Translation, 16–18 June 2014, Dubrovnik, Croatia, 183–190.

Castilho, S., & O’Brien, S. (2016). Content profiling and translation scenarios. The Journal of Internationalization and Localization, 3(1), 18–37.

Chomsky, N. (1969). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Daems, J., Vandepitte, S., Hartsuiker, R., & Macken, L. (2015). The impact of machine translation error types on post-editing effort indicators. Proceedings of the 4th Workshop on Post-Editing Technology and Practice, November 3, 2015, Miami, USA, 31–45.

De Almeida, G., & O’Brien, S. (2010). Analysing post-editing performance: Correlations with years of translation experience In V. Hansen & F. Yvon (Eds.), Proceedings of the 14th Annual Conference of the European Association for Machine Translation, 27–28 May 2010, St. Raphaël, France.

DePalma, D. A., Hegde, V., Pielmeier, H., Stewart, R. G., & Hedge, V. (2013). The language services market: 2013. Lowell, MA: Common Sense Advisory.

Depraetere, I. (2010). What counts as useful advice in a university post-editing training context?: Report on a case study. Proceedings of the 14th Annual Conference of the European Association for Machine Translation, 27-28 May 2010, St. Raphaël, France.

Doherty, S., & O’Brien, S. (2014). Assessing the usability of raw machine translated output: A user-centered study using eye tracking. International Journal of Human-Computer Interaction, 30(1), 40–51.

Doherty, S., O’Brien, S., & Carl, M. (2010). Eye tracking as an MT evaluation technique. Machine Translation, 24(1), 1–13.

Drugan, J. (2013). Quality in professional translation: Assessment and improvement. London: Bloomsbury Academic.

Fuji, M., Hatanaka, N., Ito, E., Kamei, S., Kumai, H., Sukehiro, T., Yoshimi, T., & Isahara, H. (2001). Evaluation method for determining groups of users who find MT “useful”. Proceedings of the Machine Translation Summit VIII “Machine Translation in the Information Age”, 18–22 September 2001, Santiago de Compostela, Spain, 103–108.

Guerberof, A. A. (2014). Correlations between productivity and quality when postediting in a professional context. Machine Translation, 28(3–4), 165–186.

Hovy, E., King, M., & Popescu-Belis, A. (2002). Principles of context-based machine translation evaluation. Machine Translation, 17(1), 43–75.

Howitt, D., & Cramer, D. (2011). Introduction to statistics in psychology. Harlow: Prentice Hall.

ISO (1998). ISO 9241-11:1998. Ergonomic requirements for office work with visual display terminals (VDTs) – Part 11: Guidance on usability. Geneva: International Organization for Standardization.

ISO (2002). ISO/TR 16982:2002 Ergonomics of human-system interaction – Usability methods supporting human-centred design. Geneva: International Organization for Standardization.

Jones, D., Gibson, E., Shen, W., Granoien, N., Herzog, M., Reynolds, D., & Weinstein, C. (2005). Measuring human readability of machine generated text: Three case studies in speech recognition and machine translation. Proceedings of ICASSP ’05 IEEE International Conference on Acoustics, Speech, and Signal Processing 2005 – Volume 5, 18–23 March 2005, Philadelphia, USA, 1009–1012.

Koby, G. S., Fields, P., Hague, D., Lommel, A., & Melby, A. (2014). Defining translation quality. Revista Tradumàtica: tecnologies de la traducció [Online], 12, 413–420. Available from: [Accessed 02 June 2016].

Koponen, M. (2012). Comparing human perceptions of post-editing effort with post-editing operations. Proceedings of the Seventh Workshop on Statistical Machine Translation, June 7–8, 2012, Montréal, Canada, 181–190.

Lacruz, I., & Shreve, G. M. (2014). Pauses and cognitive effort in post-editing. In S. O’Brien, L. W. Balling, M. Carl, M. Simard, & L Specia (Eds.), Post-editing of machine translation: Processes and applications (pp. 246–272). Newcastle upon Tyne: Cambridge Scholars.

Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional qualitymMetrics (MQM): A framework for declaring and describing translation quality metrics. Revista Tradumàtica: tecnologies de la traducció [Online], 12, 455–463. Available from: [Accessed 24 May 2016].

Moorkens, J., O’Brien, S., da Silva, I. A. L., de Lima Fonseca, N. B., & Alves, F. (2015). Correlations of perceived post-editing effort with measurements of actual effort. Machine Translation, 29(3), 267–284.

Nielsen, J. (1993). Usability engineering. Amsterdam: Morgan Kaufmann.

O’Brien, S., Simard, M., & Specia, L. (Eds.). (2012). Workshop on post-editing technology and practice (WPTP 2012). Conference of the Association for Machine Translation in the Americas (AMTA 2012). San Diego, CA, 28 October.

O’Brien, S., Simard, M., & Specia, L. (Eds.). (2013). Workshop on post-editing technology and practice (WPTP 2013). Machine Translation Summit XIV. Nice, 2–6 September.

O’Brien, S., Balling, L. W., Carl, M., Simard, M., & Specia, L. (Eds.). (2014). Post-editing of machine translation: Processes and applications. Newcastle upon Tyne: Cambridge Scholars.

Plitt, M., & F. Masselot. (2010). A productivity test of statistical machine translation postediting in a typical localisation context. The Prague Bulletin of Mathematical Linguistics, 93, 7–16.

Puurtinen, T. (1995). Linguistic acceptability in translated children’s literature (Unpublished doctoral dissertation). University of Joensuu, Joensuu.

Roturier, J. (2006). An investigation into the impact of controlled English rules on the comprehensibility, usefulness and acceptability of machine-translated technical documentation for French and German users (Unpublished doctoral dissertation). Dublin City University, Dublin.

Rubin, J., & Chisnell, D. (2011). Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. Indianapolis, IN: Wiley.

Sousa, S. C., Aziz, W., & Specia, L. (2011). Assessing the post-editing effort for automatic and semi-automatic translations of dvd subtitles. Proceedings of the International Conference Recent Advances in Natural Language Processing, 12-14 September, Hissar, Bulgaria, 97–103 .

Specia, L. (2011). Exploiting objective annotations for measuring translation post-editing effort. Proceedings of the Fifteenth Annual Conference of the European Association for Machine Translation, 30–31 May, Leuven, Belgium, 73–80.

Stymne, S., Danielsson, H., Bremin, S., Hu, H., Karlsson, J., Lillkull, A. P., & Wester, M. (2012). Eye tracking as a tool for machine translation error analysis In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (pp. 1121–1126), 23–25 May 2012, Istanbul, Turkey.

Suojanen, T., Koskinen, K., & Tuominen, T. (2015). User-centered translation. Abingdon: Routledge.

Tomita, M., Shirai, M., Tsutsumi, J., Matsumura, M., & Yoshikawa, Y. (1993). Evaluation of MT systems by TOEFL. Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation, July 14–16, 1993, Kyoto, Japan, 252–265.

Van Slype, G. (1979). Critical study of methods for evaluating the quality of machine translation. Brussels: Bureau Marcel van Dijk.