Tilburg University Converting the words of God ; An experimental evaluation of stylistic choices in the new Dutch bible translation

In an experiment stylistic choices in the new Dutch Bible translation were evaluated. 185 participants evaluated 4 fragments which differed in type of variation (lexical or syntactic), source of the fragment (Bible - classical text) and the way the variations were applied (single - mixed). Also the religious¬ness of the evaluator was taken into account. Examined is whether lexical modernization and syntactic simplification are evaluated the same, whether single variations show a better insight into the evaluations and if the evaluation is influenced by the source of the text or the religiousness of the evaluator. The results showed that there is a difference in evaluation between syntactic and lexical variation, that single variation gives better insight into the evaluations and that source of the text and religiousness do not influence the evaluations. This study demonstrates that translators must involve readers at an early stage in their discussions on a required or appropriate register.


Introduction
In 1993 work began on a new Dutch Bible translation (henceforth NDB).Over twenty denominations and churches from The Netherlands and Flanders, the Dutch-speaking part of Belgium, are involved in this translation project, which is expected to be completed in 2004.The translation project is organized in such a way as to allow ample opportunity for discussion on translations, not only the translation of, for example, God's name, but also options in the area of register, style, sentence structure, and vocabulary.A team of some fifteen scholars of the Hebrew and Greek as well as the Dutch languages is working in pairs on translating different books of the Bible, in which they base themselves on extensive translation guidelines with the motto "faithful to the source text and oriented towards the target text".The project, which costs about 14 million euros, has been set up in such a way that every translation goes through some six phases, with commentary from reviewers, coordinators, supervisors, literature experts, and panels of readers (see Werk in Uitvoering 1998, Renkema 1997).The express goal of this proj-ect is to produce a Bible translation that will be widely accepted as a religious ánd as a cultural document.People are already talking about the "Statenvertaling" of the twenty-first century -the Dutch "Statenvertaling" (1637) fulfills much the same role in Dutch society as the King James Version in English-speaking countries.
Within this translation project, in which some 150 people are directly or indirectly involved, there is regular discussion on stylistic choices.An example is given in (1).In reaction to this draft translation the discussion arose whether the expression baarde hem een zoon 'bore him a son' was still acceptable in contemporary Dutch.
(1) Genesis 21:1-2 De HEER dacht aan Sara zoals hij had beloofd; hij gaf haar wat hij had toegezegd: Sara werd zwanger en baarde Abraham op zijn oude dag een zoon, op de vastgestelde tijd die God hem had genoemd.'The LORD remembered Sarah as he had promised; he gave her what he had promised: Sarah became pregnant and bore Abraham a son in his old age, at the appointed time that God had mentioned to him.' Discussions on the acceptability of a certain register are often difficult to resolve without including readers' judgments.For this reason, debatable passages have been presented to proofreaders with the request to underline three words that, according to them, should be replaced by another word (see Renkema et al. 2000). 1 In passage (1), 37 percent of the readers wanted to replace baarde 'bore' with, for example, schonk 'bestowed' or gaf 'gave'.But there were also other expressions that people wanted to replace.For example, 37 percent of the proofreaders took issue with de vastgestelde tijd 'the appointed time' (they proposed instead: precies op het tijdstip 'exactly at the point in time') and 26 percent took issue with toegezegd (a more formal way to say 'promise' or 'agree to'; the proposal was the everyday way to say 'promise', namely beloofd).This example illustrates how in discussions about stylistic choices, translators' normative preferences can profit from empirical evaluations.
In a study on the evaluation of a register variant by the target group, what is at issue is not only variation in certain aspects of the register, but also, for example, the fact that readers may have certain expectations about language use in the Bible.The experimental study we report here included in its design four of such intervening factors.These factors are described in 1.1.to 1.4.

Type of variation
When translating, a translator has to make choices at every point on a great and diverse number of aspects of language use (see, for example, Snell-Hornby 1998, Bodine & Watson 1997).As an example we give in (2) two versions of the same passage from Genesis.
(2) Genesis 25:24-27, version A Toen de dag van de bevalling was gekomen, bracht zij inderdaad een tweeling ter wereld.Het kind dat het eerst tevoorschijn kwam was rossig en helemaal behaard, het voelde aan als een haren mantel; ze noemden het Esau.Toen daarna zijn broer tevoorschijn kwam, hield die Esau bij de hiel beet; hij werd Jakob genoemd.'On the day of the delivery, she did indeed bring twins into the world.The child that came out first was ruddy and hairy all over, it felt like a hairy coat; they called it Esau.When afterward his brother appeared, he was holding Esau by the heel; he was called Jacob.' Genesis 25:24-27, version B Toen de dag van de bevalling was gekomen, bracht zij inderdaad een tweeling ter wereld.Het kind dat als eerste tevoorschijn kwam, en dat ze Esau noemden, was rossig en helemaal behaard; hij voelde aan als een haren mantel.Zijn broer die daarna tevoorschijn kwam, hield hem bij de hiel beet; hij werd Jakob genoemd, 'Beetnemer'.'On the day of the delivery, she did indeed bring twins into the world.The child that was the first to appear, and that they called Esau, was ruddy and hairy all over; he felt like a hairy coat.His brother that appeared afterward was holding him by the heel; he was called Jacob, 'Leg-puller'.' The lexical variation between het eerst 'first' and als eerste 'the first to', and the variation 'holding Esau -holding him', are of an entirely different sort than the syntactic variation of a main clause in Version A (ze noemden het Esau 'they called it Esau') becoming a subordinate clause in Version B (en dat ze Esau noemden 'and that they called Esau'), or, conversely, a main clause in Version B (Zijn broer die daarna tevoorschijn kwam 'his brother that appeared afterwards') becoming a subordinate clause in Version A (toen daarna zijn broer tevoorschijn kwam 'when afterward his brother appeared').And these variations are themselves of an entirely different sort than the etymological explanation of the name (Beetnemer 'leg-puller') that has been added in Version B. In the experiment, we concentrated on reader evaluations of lexical and syntactic variation.

Expectations about language use
Stylistic judgments can be influenced by expectations about language use (for more on this, see Burgoon's (1995) language expectancy theory).For example, readers will accept somewhat archaic or elevated language more readily in a translation of Dante than in a translation of a modern author.Stylistic judgments no doubt also depend on the context in which the given style occurs.In a translation of the Bible, judgments can be influenced by expectations about Biblical language use that has to be appropriate, for example, in a liturgical context.For many people, both inside and outside the church, the Bible is a book of a totally different order than, for example, the works of Homer, Dante, or Shakespeare.In order to investigate this, we have presented in the experimental study passages from the Bible as originating from classical literature.The passage from Genesis in (2) above, for example, was also presented as a passage from a mythological history from the Caucasus describing the birth of two mythological characters named Pjotir and Warhald.Since for the rest the style remained the same, any differences in judgment could then be attributed directly to expectations about language use in a special context or a specific source text.

Combining variations
In the rather diverse decisions on lexical and syntactic variation it often remains unclear how variation may influence a readers' judgments.If a translator, for example, combines a more solemn wording with simple syntax, then it is possible that the positive response to the lexical variant is neutralized by an opposing reaction to the syntactic choice.For a valid evaluation it is thus important to work with passages in which only one type of variation, lexical or syntactic, plays a role.Only with such an experimental design one can ascertain whether changes at the lexical level elicit judgments that differ from those at the syntactic level.The experimental study therefore included passages with both types intermingled, and passages involving only lexical or syntactic variation.

Experiences with various types of language use
The evaluation of a certain style is also influenced by personal characteristics.Obvious examples are particulars like sex, age, and educational level.In literature on text design, an important factor is involvement (see, for example, Oversteegen & van Wijk 2003).In the case of Bible texts, involvement will be determined primarily by the question whether a person is religious or a churchgoer.A greater involvement can mean in this case that the person has a greater familiarity with an existing translation and is ill at ease with a new translation.It is also possible that churchgoing readers are more concerned about the content and less influenced by the formulation.The same possibilities may occur with non-churchgoers.This group may have an antiquated idea about Biblical texts, and therefore reject a new translation on the basis of its formulation; on the other hand, there may be people within this group who appreciate a modern wording because they have no experience with the older wording.For this reason, the factor "Religiousness" has been taken into account in the experimental study.

Research questions
The experiment reported in this paper was set up in order to answer four questions: 1. Are lexical modernization and syntactic simplification evaluated the same or differently?2. Is the evaluation influenced by the fact that the text originates in the Bible or another classical source?3. Do unidimensional (lexical or syntactic; hereafter called "single") variations allow a better insight into the evaluations than mixed (both lexical and syntactic) variations? 4. Does the fact whether the evaluator is a churchgoer influence the evaluation?

Materials
In the experiment, the participants were given a number of fragments to read and then judged these on a number of rating scales.To keep the time needed for the task within one hour the number of passages to be studied had to remain restricted.Further, only short and more or less context independent passages could be used.Because the reading of a short passage without too much introductory explanation can easily take up to ten minutes, four passages were chosen: two from the Old Testament and two from the New Testament.Of each pair, one contained mixed variations and the other a single variation.A characterization of these passages is depicted schematically in Table 1.
Because style judgments can also be influenced by content and genre, the attempt was made to choose similar passages for each pair -in other words, no philosophical passage from a Pauline epistle matched with a healing miracle story from a Gospel.The choice was also influenced by the necessity of presenting the story as non-Biblical.This meant that passages containing typically Christian concepts such as 'grace' or 'resurrection' could not be used.
The choice was made to use the narrative genre, and within that genre narratives containing a miraculous element.For the Old Testament, two stories were chosen that could fit within a non-Christian myth or legend: the story in which Jacob purloins the right of the firstborn from Esau (Genesis 25:19-34) and the story in which Esther saves her people from disaster (Esther 7:1-8:2).For the New Testament, two healing miracle narratives were chosen: the healing of a demon-possessed man (Mark 5:1-20) and the healing of a cripple (Acts 14:8-20).(See the labels above the columns in Table 1.) Each of these four passages was presented either as originating from the Bible or as originating in a classical text with each time a more precise description of its source, for example the mythological history of the Caucasus (see Table 1, first and second rows).In order to be able to present the Bible passages as originating in another text, several changes had to be made in protagonists and locations (see Table 1, third row).Care was taken to ensure that no well-known personal names were used (such as Achilles) and that the made-up personal names would not evoke associations (such as Clyntoon).The Biblical and non-Biblical passages thus differed only in the names used.
For each of the four passages there were two stylistic variants.The original was always the proposed NDB text.The variation had various origins (see Table 1, fourth and fifth rows).For the mixed variations, an alternative proposal (taken from commentary on the NDB proposal) was used once and another time the most recent Bible translation from 1995.In both cases the text versions differed on a number of very diverse points.The text from the Old Testament contained a moderate number of differences (the Genesis passage contained 15 differences), and the text from the New Testament contained an abundant number of differences (the passage from Mark, containing more than 50 differences).In the case of the single variations, the comparison text was in one case a lexically more pedestrian variant (the passage from Esther) and in the other case a syntactically more complex variant (the passage from Acts).These text versions thus varied in only one aspect and that to a relatively limited extent: in the lexically varying text twenty times, in the syntactically varying text ten differences at the clausal level (Tables 6 and 7 present a complete listing of these variations).

The Questionnaire
The questionnaire included items on personal characteristics and on text evaluation.For each of the four text fragments, text evaluation was asked for in two ways, comparatively and independently.Every questionnaire concluded with a task in which the participant was presented with both versions of one of the fragments, had to compare them, and was asked to mark the differences they considered important and to indicate which version they preferred.

Personal characteristics
Next to gender, age, and educational level, two types of personal involvement in the issue at stake were measured.Religiousness was determined on the basis of three yes/no questions (see ( 1)), the attitude towards style renewal with two seven-point agree/disagree rating scales (see ( 2)).
(1) I go to church at least once a month I hear a reading from the Bible at least once a week I read the Bible at least once a week (2) It is important that the Bible is made accessible in modern Dutch Every generation has a right to its own Bible translation

Comparative text evaluation
Seven descriptive labels were presented in random order with the instruction to assign them to a fragment in the order in which they were found to be applicable: the label that applied best was selected first and received a score of seven, the one selected second received a score of six, and so on down the line, each subsequent label receiving one point less.The label found to apply least well got a score of one.In (3) the labels are listed groupwise: the first three are approving, the last three disapproving.The middle was more or less irrelevant and was added as a sort of filler item; it was excluded from the statistical analyses.

Independent text evaluation
Sixteen items were formulated for attractiveness (At1-At4), clarity (Cl1-Cl4), solemnity (S1-S4), and appropriateness (Ap1-Ap4).Each aspect was measured using an equal number of Likert scales and semantic differentials (for a complete listing, see Table 2).The actual relationships between the items were determined by way of a principal component analysis with varimax rotation.This procedure resulted in four components, accounting for 68 percent of the variance in the scores.These were largely Ap1 I find the language used in the text: appropriate -inappropriate -.10 .

84
.13 Ap2 The language used fits the way the story is told -.35 .12 .

73
.24 Ap3 The language used renders the ambiance of the story well -.10 .00 .

75
.35 At3 I find the language used in the text: monotonous -ornate -.04 .

29
.77 At4 I find the language used in the text: lively -boring  in accordance with the a priori clustering of the items.The first component was dominated by clarity items (Cl1-Cl3) and included also two attractiveness items (At1-At2), the second component was made up by solemnity items (S1-S4), the third by appropriateness items (Ap1-Ap3), and the fourth by attractiveness items (At3-At4).Two items did not load uniquely onto one component (Cl4, Ap4) and were excluded from further analyses.For each component the scores of the items loaded onto it with an absolute value of .50 or more were combined into one scale (first the negatively phrased items were recoded of course).The reliability of the scales was good for clarity (Cronbach's α = .87)and appropriateness (Cronbach's α = .82),adequate for solemnity (Cronbach's α = .76)and moderate for attractiveness (Cronbach's α = .65).

Spontaneous preferences
Every participant was shown both versions of the same passage.One version was always a text that previously had been presented as a Biblical text.The second version of this text was presented as one made by another translator.
The task was to read through both versions at one's own pace, then to underline the important differences between the two texts and then to indicate, for each difference noted, a "+" for the presentation that was found to be better or a "=" if no difference was found.At the same time the respondents were asked to tell which of the two versions they would choose if they were buying a Bible translation.

Participants
A total of 185 respondents took part in the experiment, 60 percent of them were men, 40 percent women.Ages ranged from 18 to 76 and were evenly distributed over this interval with a mean of 44.5 (sd=16.8).The women were on the average a little younger than the men (40.8 versus 47.0 years, t(181)=2.47,p<.025).
The group was divided up in terms of religiousness based on the answers to the three yes/no questions in (1).People who answered two or more of the questions with "yes" were placed in the "religious" group; the others were classified as "nonreligious."Of the nonreligious respondents (N = 75), 76 percent answered no to all three statements; the others only answered yes to the statement about church attendance.Of the religious respondents (N = 109), 90 percent answered all three statements with "yes"; if there was a "no" answer it was usually to the statement on whether they read the Bible themselves.Religiousness was divided equally over the sexes: of the men, 62 percent was religious and of the women 53 percent (χ 2 (1) = 1.16, p = .28).There was a clear age difference: the religious respondents were significantly older (53.1 versus 32.3 years, t(182) = 10.40,p < .001).

Procedure
Basically a within-subject design was applied, that is, each participant responded to all four fragments.The fragments were presented in two ways: either Esther, Mark, Genesis, and Acts, or the reversed order.Within these sequences the two other experimental factors, style and source, were varied independently of each other.This guaranteed that each of their combinations was presented to a quarter of the respondents.

Statistical analyses
For each text, a three-way Manova test was carried out on the scores of the comparative text evaluation as well as the independent text evaluation, with the between-groups factors being Style (NDB, alternative), Source (Bible, literature), and Religiousness (yes, no).In the results we report both the level of significance and the proportion variance explained (p and η 2 ; see van Wijk 2000:102-104, 157).
At first a fourth factor was included in these analyses, namely either Sex (man, woman) or Age (younger than 45 years, older than 45 years).Only age had a systematic effect on the text evaluation.The older the respondents were, the more inclined they were to be more charitable in their judgments, regardless of the experimental condition.Because this effect was not relevant to the research questions, the factor age was controlled for by taking it as a covariate in the analysis of variance.Sex is not included as a factor in the definitive analysis.

Opinions about restyling
Religiousness had no effect on the opinion that every generation had a right to its own Bible translation (F(1,182) = 2.30, p = .13).Religious people endorse significantly more strongly the notion that it is important to make the Bible more accessible by using modern Dutch (6.33 versus 4.97; F(1,182) = 13.07,p < .001,η 2 = .07).No less than 61 percent gave this opinion the maximum score of 7 and another 23 percent the second highest score of 6.The churchgoers included in this sample were thus very positive about the modernization of the Bible.

Comparative text evaluation
Table 3 presents the results for the text evaluation labels that were scored in comparison with each other.Each fragment displayed a considerable effect of religiousness (Genesis: F(6,171) = 2.55, p < .025,η 2 = .08;Mark: F(6,171) = 5.53, p < .001,η 2 = .16;Esther: F(6,171) = 3.28, p < .005,η 2 = .10;Acts: F(6,171) = 3.34, p < .005,η 2 = .11).Nonreligious respondents scored the texts higher on the disapproving labels, religious ones did so on the approving labels.The differences between both groups were more clear-cut for the fragments from the New Testament than for those from the Old Testament.
Source had no effects (for each fragment: F < 1) and none of the between-factor interactions reached statistical significance (all F's < 1.91, p > .08).Note: For a significant difference according to the univariate Anova, the highest score is printed in bold.
Table 5 specifies the direction of the interaction between Source and Religiousness in the fragment from Acts.The interaction was significant for attractiveness (F(1,173) = 11.53,p < .001,η 2 = .06)and clarity (F(1,173) = 4.12, p < .05,η 2 = .02;Appropriate: F < 1; Solemn: F(4,170) = 2.88, p = .09).Nonreligious respondents scored the text more positive when presented as literature, religious respondents did so when presented as biblical.Note: For a significant difference according to the univariate Anova, the highest score is printed in bold.

Spontaneous preferences
For the two fragments with a single variation in style, that is either lexical or syntactic, we discuss in more detail the specific differences actually detected by the participants.These results are presented in the Tables 6 and 7.Each table lists in the second and last column exhaustively the stylistic differences.The first column specifies the percentage of respondents that pointed at the corresponding phrases as an 'important' difference.In Table 6 these percentages range from 22 to 89, in Table 7 from 9 to 91.Apparently each difference was not considered equally important.Still in more than half of the cases the detection rate lies above 50 percent.
The version one preferred is specified in the third and fourth column.In the Esther fragment (Table 6), the NDB-version with the more solemn lexicon was preferred 13 times, the alternative version with the more common expressions 4 times.In three cases the preferences scored a tie.The direction of the preferences, i.e. in favor of the more solemn version, was statistically significant (sign test: p < .05).In the Acts fragment (Table 7), the NDB-version with the less compact syntax was preferred 9 times, the alternative version only once; this trend was significant as well (p < .025).
As most objective measurement of preference, participants had to decide which stylistic variant they would prefer to buy.The results are presented in Table 8.For the fragments showing mixed variations, there is no difference in preference; the distribution over versions approaches fifty-fifty (Genesis: χ 2 (1) = 0.72, p = .40;Mark: χ 2 (1) = 0.08, p = .77).In the case of the fragments showing single variations there is a clear difference: the preference is more often for the NDB version with the more solemn words (Ester: χ 2 (1) = 2.50, p = .11)and the NDB version with the less compact syntax (Acts: χ 2 (1) = 8.70, p < .005).

Conclusions
The answer to the first research question, whether there was any difference in evaluation between lexical and syntactic variation, indeed discloses an intriguing difference.This difference emerged thanks to the different measuring methods employed: an evaluation of the text with and without a comparison text.When respondents are presented a text with a more elevated or a more everyday choice of words (the Esther variants), preference is given to the one with the more elevated register.Apparently they find this to fit better with the content and the genre.This preference remains when readers are given the opportunity to compare the more elevated and the everyday versions with each other.Both in spontaneous remarks and in answer to the question which version they would be more likely to buy, respondents prefer the version with the elevated choice of language.Things are different when one looks at syntactic variation (the variants based on Acts).Analogously to the preference to elevated wording, readers prefer a more complex sentence structure in isolated judgments.But when readers are allowed to compare and choose one of the two, they prefer the simpler variant.A possible explanation for this difference is that syntactic elements in a text tend to arrest the attention less than lexical ones because the natural reading behavior of readers shows a tendency to pay attention to the content.The syntactic variation, as it were more deeply hidden, then comes to light only upon comparison.It is remarkable that readers make a different choice on the lexical plane than on the syntactic plane, since at first sight it would seem that an elevated choice of words would fit less well with a simpler sentence structure.
The second research question was prompted by discussions on style in the NDB in which it was often argued that the Bible had to be translated within a certain register because the Bible is another type of book than a work from classical antiquity.This research, however, demonstrates that it doesn't matter much for the evaluation of the text whether it comes from the Bible or from another ancient source.Barring one exception, no differences emerged.
The results of the third research question, the difference between single and mixed variation, show that a single variation gives a better insight into the evaluations than a mixed variation.In fact only the single variations turned out to yield different evaluations.The explanation for this must be that the different effects of lexical and syntactic variation cancel each other out.It does remain remarkable that the differences between the versions are quite large, but that this difference does not lead to a different text evaluation.It seems that a very definite difference is needed in a particular aspect such as the lexical in order to be able to ascertain a difference in the evaluation of the text.
The results of the fourth research question on the influence of the level of religiousness show that whether a reader is a churchgoer or not had no influence on the evaluation of the Bible translation.This is the more remarkable in view of the fact that the two groups differed on the average more than twenty years in age.The factor of religiousness did lead to other judgments on texts, separate from the choice of specific version.As especially their different evaluations of clarity and solemnity show, the churchgoing test subjects were far more familiar with the content and style of the texts presented to them.

Practical relevance
What can this research contribute to discussions on translating and in particular those on translating the Bible?See e.g. the important publication on bible translation and style (Gillaerts 2000) in which topics such as accessibility and contemporaneous expectations about literal style are discussed.In publications like this usually thorough discourse analyses are presented without empirical data.Our contribution is based on real data consisting of reactions of readers on stylistic choices.The results of the experiment cast doubt on two fairly generally accepted presuppositions in discussions on translating the Bible.The first is that the Bible requires a different type of language use than other older (classical) texts.Translators may be able to defend this point of view but it is not in keeping with what readers think.The second presupposition is that in a Bible translation for religious people another type of language needs to be used than in a translation for nonreligious people.This study has shown that both nonreligious and religious people react in the same way to certain language registers.
The most important gain from this study is, however, the fact that it has now been demonstrated that translators must involve readers at an early stage in their discussions on a required or appropriate register.For example, the translators thought that there were big differences between the passages from their own translation in progress and the most recent translation.These differences were not, however, recognized as such by the readers.It did turn out that readers primarily use lexical aspects when forming a judgment, and often those were different from the ones that the translators were worrying about.This study can thus be seen as a defense of the adage: "Translators, don't discuss only with each other but do so especially with your readers."

Table 2 :
Loadings of the items for text evaluation after varimax rotation Note: Loadings with an absolute value of .50 or more are printed in bold.

Table 3 :
Scores of text evaluation labels in relation to Style, Source and

Table 5 :
Scores on text evaluation scales in relation to Religiousness andSource for the fragment from Acts (minimum score = 1, maximum score = 7) Note: For each significant interaction, the highest score in the subgroup concerned is printed in bold.

Table 4 :
Scores on text evaluation scales in relation to Style, Source, and Religiousness for each text fragment (minimum score = 1, maximum score = 7)

Table 6 :
Lexical preferences given spontaneously in the Esther fragment (scores are percentages,N = 36)