Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

Bogdan Babych; Anthony Hartley

doi:10.52034/lanstts.v8i.246

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

Authors

Bogdan Babych University of Leeds
Anthony Hartley University of Leeds

DOI:

https://doi.org/10.52034/lanstts.v8i.246

Keywords:

automated error-analysis, multiword expressions, BLEU, automated metrics, concordance, concordance-based evaluation of Machine Translation, MT-tractability

Abstract

We describe the results of a research project aimed at automatic detection of MT errors using state-of-the-art MT evaluation metrics, such as BLEU. Currently, these automated metrics give only a general indication of translation quality at the corpus level and cannot be used directly for identifying gaps in the coverage of MT systems. Our methodology uses automatic detection of frequent multiword expressions (MWEs) in sentence-aligned parallel corpora and computes an automated evaluation score for concordances generated for such MWEs which indicates whether a particular expression is systematically mistranslated in the corpus. The method can be applied both to source and target MWEs to indicate, respectively, whether MT can successfully deal with source expressions, or whether certain frequent target expressions can be successfully generated. The results can be useful for systematically checking the coverage of MT systems in order to speed up the development cycle of rule-based MT. This approach can also enhance current techniques for finding translation equivalents by distributional similarity and for automatically identifying features of MT-tractable language.

Downloads

Published

25-10-2021

How to Cite

Babych, B., & Hartley, A. (2021). Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors. Linguistica Antverpiensia, New Series – Themes in Translation Studies, 8. https://doi.org/10.52034/lanstts.v8i.246

Download Citation

Issue

Vol. 8 (2009): Technology evaluation

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under the CC BY-NC 4.0 Deed that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. The material cannot be used for commercial purposes.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Automated error analysis for multiword expressions: Using BLEU-type scores for automatic discovery of potential translation errors

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

linkedin