Building specialized dictionaries using lexical functions

It is now widely acknowledged that terms enter into a variety of structures and that classic taxonomies and meronymies represent only a small part of the relationships terms share. This can be seen in recent specialized dictionaries that account for derivational relationships, co-occurrents, synonyms, antonyms, etc. It also has been underlined in several articles written by terminologists as well as linguists or computational scientists working with specialized corpora. This article will discuss the advantages and shortcomings of trying to account for semantic relations between terms using a specific framework, i.e. lexical functions (Mel’cuk et al. 1984-1999, 1995). It is based on a long-term project aimed at converting an existing paper dictionary (Dancette & Réthoré 2000) into a relational database. We will show that even if lexical functions have several advantages, a number of decisions must be made to accommodate the description of specialized terms.


Introduction
In this article, we would like to summarize the different insights brought by a research project aimed at representing a wide variety of semantic relationships between terms using a formal lexico-semantic framework called Explanatory and Combinatorial Lexicology (ECL) (Mel'cuk et al. 1984(Mel'cuk et al. -1999(Mel'cuk et al. , 1995)).The framework was used in order to assist terminographers during the process of converting a printed dictionary (Dancette & Réthoré 2000) into a relational database.The original specialized dictionary is bilingual (English-French) and deals with terms pertaining to the field of retailing.It is described briefly in section 2.
These insights, even though they have been supplied by a specific project, shed some light on fundamental and applied issues related to the description of terms and terminological relationships in specialized dictionaries.It is now widely acknowledged that terms enter into a variety of semantic relationships with other lexical units.These relationships can be hierarchical, as those shared by a hyperonym and its hyponyms, or non-hierarchical, as those shared by cause and effect relationships or by noun and verbal collocations.
The traditional methodologies for describing terms -placing the concept at the centre of the analysis and trying to account for the organization of knowledge -have led terminologists to focus on taxonomic and meronymic relationships and to overlook an entire set of relevant relationships.During the past decade terminologists have started to question ^^ the adequacy of these methodologies for describing terms and have turned to alternative descriptive models such as those supplied within the field of lexicology.Attempts at describing relationships between terms using lexico-semantic frameworks, which were scarce only a decade ago, are now becoming widespread.Several authors have claimed that descriptions should rely more heavily on linguistic models (e.g.Frawley 1988;Jousse & Bouveret 2003, also, refer to L'Homme forthcoming, for a review).Some have even been implemented in commercial dictionaries (Binon et al. 2000;Cohen 1986;Dancette & Réthoré 2000).
In this article, one question will hold our attention: Should terminologists try to capture the organization of knowledge in specialized subject fields using terms as linguistic representations of this knowledge or should they account for lexical units with specific meanings?The answer has important implications on the ways in which terms and relationships between terms are envisaged.
Traditionally, the focus on the organization of knowledge (or conceptual approach) would lead to consider a limited set of relationships, mainly logical, or hierarchical.However, the focus on meanings that we have adopted from the start has led us to adhere to a lexico-semantic approach and to consider the wide variety of relationships into which terms enter.In the course of the conversion of the dictionary of retailing, we had to deal with the two approaches simultaneously.If the printed dictionary was originally compiled according to a conceptual approach, it also includes information, such as collocates or derivations, that is not always present in terminological dictionaries.
This is why we chose to convert it using ECL, which is a framework based on the senses of lexical units.Our work shows that it is practically feasible to combine conceptual and lexico-semantic approaches.However, even if we adopted the ECL approach, and more precisely its mechanism for capturing lexical relationships, i.e. lexical functions (LFs), a number of pragmatic factors led us to distance ourselves from the original model.In this paper, the term lexical function (LF) refers to the functions developed in the original model whereas the term lexicosemantic relation (LSR) to the final implementation as it will be found in the electronic dictionary of retailing.
This article is divided into four sections.Section 2 describes very briefly the printed dictionary, which served as a basis for this project and the general layout of the relational database into which it has been imported.In section 3, we explain why lexical functions (LFs) were chosen for systematically capturing the multiple relationships between terms.In the main part of the article (section 4), we will present a few results and explain the choices we made.In doing so, we will show why and how we distanced ourselves from the original LF model and adapted it to capture relationships that appeared central in the field of retailing but that were not taken into account in LF formalism.Finally, section 5 will sum up the problems we encountered and will provide a discussion on future directions for terminology.A list of lexical functions cited in the article with their explanation is given in Appendix B. The revised list, called LSRs, is given in Appendix C.

Brief presentation of the dictionary (paper and electronic)
The Dictionnaire analytique de la distribution / Analytical Dictionary of Retailing (Dancette & Réthoré 2000) is intended for translators as well as professors, students and professionals in business and distribution trades.It targets two main objectives: to list the largest possible number of terms and to present them in a manner that facilitates the comprehension of concepts to the maximum.
The dictionary is structured around 350 key concepts related to shopping centres, marketing, shop layout, etc.These key concepts are described with a keen effort to highlight the semantic relationships linking the terms and to explain the nuances in meanings and regional differences in usage.The articles of the dictionary of retailing are written as short encyclopaedic texts in a language that is far from formal.Phraseological variations have been favoured over regularity and systematicity of expression.
In addition to the 350 full-fledged articles (the main body of the dictionary), the dictionary includes a lexicon of some 3500 French and English related terms, covered in the body of the 350 articles.
Each entry is divided into nine parts (an example has been reproduced in Appendix A): 1.The English main headword and its synonyms followed by grammatical information and usage marks; 2. The French equivalent terms; 3. A French definition; 4. Semantic precisions; 5. Semantic relationships between the terms belonging to a single field; 6.Additional information providing extralinguistic information (historical notes or pragmatic information); 7. Linguistic information; 8.An English and a French context; 9. Examples.
The contents of the printed dictionary were placed in a relational database.The nine headings have been distributed in five different tables.The first two tables contain linguistic data related to the English and French terms respectively; the third table contains the definitions; the fourth table is used to store the data on contexts.The clear separation of the linguistic and semantic data (i.e. the terms and contexts on one side, and the definition on the other) allows for a flexible integration of other lan-guages without having to redesign the core of the entry.Finally, the fifth table contains the data on semantic relationships.Figure 1 shows how the relationships between the tables are established.Figure 2 shows a concrete example of this implementation.

Dealing with various semantic relationships
As can be seen in Appendix A, various semantic relationships between the headword and other terms that belong to the field of retailing are explained throughout the article, especially under the headings Definition, Semantic precisions and Internotional Semantic relations.The relationships can be classified as: • hyperonymy and hyponymy: e.g.Other relationships such as quasi-synonymy, co-hyponymy, meronymy and antonymy are to be found throughout the dictionary.They are not only mentioned, but explained extensively in natural language and with a variety of formulations.When incorporating this information in the relational database, we wanted to systematize the explanations without losing their expressiveness.Lexical functions (Mel'cuk et al. 1984(Mel'cuk et al. -1999(Mel'cuk et al. , 1995) ) appeared to be the best solution in this respect.In the following subsection, we will present lexical functions and further explain their interest in accounting for the semantic relationships in the specialized dictionary on retailing.

Why lexical functions?
A lexical function (LF) is designed to capture a general, abstract and recurrent sense in different languages.It is written f(x) = y with f representing the function, x the argument (or keyword), and y the value expressed by the function when applied to a given argument.The meaning associated with an LF can produce a relatively high number of values.For example, Magn is a function that expresses an intensification.It can be applied to different lexical units and produce a high set of values (e.g.Magn(smoker) = heavy; Magn(bachelor) = confirmed, etc.) (Mel'cuk et al. 1995: 126-127).
LFs were chosen in our project for the following reasons: • The model is relational and is thus formally compatible with our database.• They capture a large set of senses.There are approximately 60 standard LFs (Mel'cuk et al 1995;Wanner 1996).• Different semantic relations -paradigmatic and syntagmatic relationscan be accounted for with the same apparatus.• The same sense is described with the same LF.This enables us to overcome the stylistic variety of the printed dictionary.• LFs can be further explained in a more transparent way for users (Polguère 2003).

Assigning lexical functions to terms: methodological choices and initial problems
When converting the semantic relationships explained in the printed dictionary, a number of methodological choices had to be made.We will illustrate a first one with the example cited above, i.e. the AUCTION entry.
A close look at the article reveals that the French term ENCHÈRE has two different senses (in English, two different terms are used for each sense, AUCTION and BID).LFs are assigned separately to each sense as shown in Table 1.Other relevant lexical functions and their explanations are provided in Appendix B. Also, since the original dictionary was not compiled according to a lexico-semantic approach, key information is lacking.For example, several terms that appear in the articles are not described in a separate entry.Also, since the focus was placed on noun terms (nouns or noun phrases), many related verbs or adjectives are missing (e.g.FRANCHISAGE (Engl.FRANCHISING) was encoded but not FRANCHISER (Engl.TO FRANCHISE)).More fundamentally, the actantial structure of terms was not clearly indicated, but the assignment of several LFs requires this type of information.
Another problem was caused by the fact that several entries are complex nouns with a compositional meaning.Although this is a current practice in terminological dictionaries, it posed difficulties in the assignment of LFs.For example, an entry is devoted to the complex term BIEN DURABLE (Engl.DURABLE GOOD) and another to the generic term BIEN (Engl.GOOD).We chose to account for paradigmatic relationships such as those existing between BIEN DURABLE and BIEN with the function Gener.However, if we had applied the principles of ECL, we would have described the syntagmatic relationship between BIEN and DURABLE with the function Magn Temp .
Finally, many terms highlighted in the dictionary share a semantic relation that could not be accounted for in terms of lexical functions.

Results
Related terms (in bold in the printed dictionary) were scrutinized as potential lexico-semantic relations (LSRs) as often as they occurred in the different articles of the dictionary.Because their relation with each headword had been examined, we were able to detect and correct inconsistencies in the attribution of LSRs.A total of 28 different functions were used.This number was deemed sufficient and necessary to account for the relationships between the terms that carry the most important information on concepts referred to in the dictionary.However, as we will see in the following sections, the assignment of some of the original lexical functions (LFs) was modified.Firstly, some LFs were simplified (i.e.we chose to generalize some fine-grained distinctions expressed by LFs because we did not have enough occurrences to justify their use).Secondly, other functions were created to capture relationships not accounted for in the original model (e.g.relationships that are central in terminological descriptions but that ECL would not consider relevant for lexical units).We discuss three classes of LSRs: 1) classic relationships in terminology: synonyms, antonyms, taxonomic and meronymic relationships; 2) actantial and circumstantial relationships (both categories are paradigmatic relationships with nominal forms); and 3) syntagmatic and derivational relationships.

Synonyms, antonyms, taxonomic and meronymic relationships
These typical terminological relations correspond in the original LF model to Gener, Syn (and other types of synynony, e.g.Syn ∩ ∩ ), Anti and Contr.We used the main distinctions made within the FL model, but made some important adaptations to the data contained in our dictionary: • Gener proved very useful but following Grimes (1990), we felt a strong need to add the function Spec, extremely productive in taxonomic series to list the different specific terms that are linked to an entity (e.g.SPECIALTY CENTER, FACTORY-OUTLET CENTER, MEGAMALL are described as Specs of SHOPPING CENTER).This decision is partly linked to our methodological choice to consider most adjectives as parts of complex nouns.For example, DUTCH AUCTION (FR.: VENTE AUX ECHÈRES, VENTE SOUS-ECHÈRES is semi-compositional.Hence, it would not be possible to describe this noun phrase under the entry.• The standard practice in specialized dictionaries is to list all true synonyms as headwords.In the electronic version, all synonyms are linked to the definition.Other terms share a number of semantic features but not all.These are represented using lexical functions such as Anti, Conv, Syn∩ ∩ and Contr.Departing from the LFs in the ECL model, we used the label Contrast for all terms that oppose one another by one feature but share with it all other features; and we used the function Syn use for all the terms that refer to the same reality but consider it from a different point of view, depending on the use or usage of the term.Thus, BIEN DURABLE (Engl.In our corpus, the LSR Contrast proved as productive as the LSR Syn use .This should come as no surprise.A specialized dictionary has as its main objective to shed light on each distinct entity and to distinguish the nuances of meaning between terms.We identified the function Contrast in each sentence saying or meaning "X opposes Y by the feature A".Regrouping and degrouping all the terms entering into an opposition relationship on the paradigmatic level was part of our terminological methodology.• Finally, a number of meronymic relationships were taken into account in our dictionary.We first used the original LFs Mult ("group of") and Sing ("an element within a group") to capture relationships such as those shared by the terms CLIENT (Engl.CUSTOMER) and CLIENTÈLE (Engl.CLIENTELE).We also resorted to other functions to capture different types of meronymic relationships.The function Part was added, as suggested by Fontenelle (1997) to represent relationships shared by parts and wholes (e.g.CASH REGISTER is described as a part of CHECK-OUT COUNTER).We also created its opposite function Tot.Phase was also added in order to account for the chronological phases in a process (Dancette & L'Homme 2002) (e.g.GROWTH is a phase in a PRODUCT LIFE CYCLE).

Argumental and circumstantial relationships
A number of relationships involved predicates and their arguments.In the original LF model arguments are noted according to their position (S 1 , S 2 , S 3 , etc.) and circumstants as S res , S loc , but we opted for a more systematic and transparent notation for potential users of our dictionary.For instance, the predicative term

Other relationships: properties, units of measure, utility
A number of LSRs were assigned in a way that differs substantially from the original LFs.Doing this, we comply to the more traditional view of relationships in terminology: the expressions given as values point to some important information on the concept (i.e.prototypical properties); they do not rely on argumental roles.
The function Prop (property) illustrates this point.It was conceived to retrieve the terms expressing the technical features attached to the definition of a concept, as illustrated by Sager's example (1990: 34): "compressibility is a property of gas".In our corpus, sentences expressing this relationship in such a clear way are scarce, and the function Prop proved complex and rarely directly marked.Consequently, we relied, in many cases, on the expert knowledge of the concept more than on linguistic markers (see Dancette & Halimi 2004, for more details.)In our field, experts say that 'products' are identified by the following intrinsic properties: PRICE, BRAND, LIFECYCLE, PRODUCT DIFFERENTIATION, PROFITABILITY, MARKETSHARE.Similarly, 'point of sale' is identified by: ASSORTMENT, SERVICE, PULLING POWER, PRICE POLICY.Here are some examples:

Relationships between nouns and other parts of speech
As expected, these proved much less productive than the paradigmatic relations described in the previous sections.

Relationships with verbs
Especially for verbs, the FL sophistication was deliberately discarded.Verbs were grouped in three categories, i.e. derivations (ENCHÈRES ⇒ enchérir; BID ⇒ to bid; ÉTIQUETTE ⇒ étiqueter); direct collocative verbs (ENSEIGNE ⇒ développer l'enseigne, BANNER ⇒ to develop the banner; SHOP ⇒ to set up shop, BANNIÈRE ⇒ implanter l'enseigne); associated actions, i.e. verbs for typical actions related to a concept (ENCHÈRE ⇒ adjuger, AUCTION ⇒ to knock down, to strike off).This simplification was deemed adequate because the user of the dictionary is expected to rely on his/her linguistic ability, as opposed to the ECL model, which may be used for encoding purposes.

Relationships with adjectives and adverbs
Only a few adjectives and adverbs appeared as true headwords (e.g.ACHALANDÉ (Engl.WELL PATRONIZED), AFFILIÉ (Engl.AFFILIATED), BON MARCHÉ (Engl.LOW-COST, INEXPENSIVE), HAUT DE GAMME (Engl.HIGH-END, UPMARKET), PROMOTIONNEL (Engl.PROMOTIONAL), etc.As mentioned above, most adjectives entering into the composition of complex nouns were discarded in order to highlight paradigmatic relations rather than syntagmatic ones.
The criteria we used is the indissociability of the term.Thus, if DURABLE associates only with GOODS or PRODUCT; then DURABLE GOODS is considered a lexical unit, as opposed to ACHALANDÉ (Engl.WELL PATRON-IZED), or APPROVISIONNÉ (Engl.WELL-STOCKED), which associate with a larger number of nouns.
It can even be argued that many adjectives such as DUTCH in DUTCH AUCTION or CHINESE in CHINESE AUCTION, DURABLE in DURABLE GOODS, BANAL in BIEN BANAL, DIRECT in DIRECT MARKETING take on specialized meanings when they are in association with a few extremely specific nominal bases.
Also, we found that the adjective is a highly unstable part of the term in the vocabulary of retailing, and that many synonyms can associate with the same term (e.g.DIRECT MARKETING is synonymous with DIALOGUE MARKETING, PERSONAL MARKETING, DATABASE MARKETING, RELA-TIONSHIP MARKETING).Treating the adjectives as terms would have led to an explosion of the category without adding informational value.Furthermore, the lexicalization of adjectival forms is an unequal process when we compare pairs of languages.For example, English has the adjectives ANCHORLESS, ANCHORED, but French does not; and therefore unstable periphrases are used.

Concluding remarks
Converting a terminological dictionary in electronic form enabled us to highlight some differences between the encoding of semantic relationships shared by terms in a terminological setting and the same encoding but viewed in a lexicographic framework.Of course, we referred to an extremely specific lexical framework, ECL, but much of what has been said above would apply to other formal models, at least as far as our comparison with terminology is concerned.
It would seem that the description of lexical units related to a special subject field requires that a choice be made between the conceptual approach and the linguistic approach.Even if we tried to combine the two approaches, we had to discard many lexical features, and syntagmatic relations were not systematically looked for.In addition, many methodological decisions had to be made in order to accommodate the data under analysis.
However, the ECL model of lexical functions helped us to retrieve more relationships than classic terminological approaches.FLs are closely linked to the linguistic definition of the headword with the identification of all its actants.Furthermore, the distinctions of senses are based on the linguistic behaviour of lexical units, which will then be treated separately.On the other hand, terminological definitions try to answer questions on the nature of things (what, where, when, how, for what purpose, etc.).But we encounter a new problem here: commercial entities are culture-dependent and their descriptions or definitions vary accordingly.For example, in North America, a post office is often located in a drugstore; therefore the relation part(drugstore)=post office is possible in some cases.If a lexical approach can help reduce the imprecision of definitions, assigning semantic relations is very often risky, because of fuzziness.It would have been impossible in some instances without the expert knowledge in the field of retailing.In such cases, the syntactic forms of the sentences were less reliable than encyclopaedic knowledge.
As a conclusion, we found that LFs appear to be a very helpful tool for capturing semantic relationships between terms.They could help enrich terminological descriptions and offer a means to better interpret relationships between terms.We clearly think that combining terminological and lexicological traditions helps to understand different facets of terms and concepts.Starting a new dictionary using a combined approach in a systematic manner would be beneficial and offer new perspectives for terminographic projects.

Informations linguistiques:
• vendre au plus offrant: sell v to the highest bidder • offrir un prix, enchérir: bid v on 8. Contextes: Auctions are an important part of assembly and selling operations in the agricultural markets of many countries, for they have traditionnally provided a rapid and effective means of disposing of goods, especially perishable products.Auctions are also frequently used to sell products directly to the consumers, especially if the value cannot readily be precisely determined, as is the case of works of art or antiques.(Britannica Micropaedia 1991) Aucune formalité spéciale n'est prescrite dans les enchères de meubles.Mais, dans les adjudications d'immeubles, pour laisser aux intéressés le temps de réfléchir, le Code de procédure civile prescrit l'emploi de bougies pouvant rester allumées une minute environ.L'adjudication ne peut être prononcée qu'après l'extinction successive de trois bougies.(Grand dictionnaire encyclopédique Larousse 1987) 9. Exemples:

Table 1 :
lexical functions assigned to two different senses of enchère Similar problems have been encountered with articles dealing with verb nominalizations, which can convey a meaning of activity or a meaning of result.Theses senses had to be clearly separated before LFs could be assigned.
Also, a number of circumstantial relationships were found in the dictionary.The following example illustrates the need to look for such relationships.The predicative term MARQUE (Engl.BRAND) has PRODUIT (Engl.PRODUCT) and NOM DE MARQUE (Engl.BRANDNAME) (or LOGO, or BRANDMARK) as arguments.But the term FIDÉLITÉ À LA MARQUE (Engl.BRAND LOYALTY) found in the same article does not have an argumental role.Other examples of argumental and circumstantial relationships described in the dictionary are given below: CONCESSION DE LICENCE (Engl.LICENSING)calls for the arguments LICENCE (Engl.LICENSE), CONCÉDANT (Engl.LICEN-SOR), LICENCIÉ (Engl.LICENSEE).In our model, however, LICENSE is noted 'object aimed at' (Obj); LICENSOR is identified as 'agent' (Ag) and LICENSEE as 'recipient' (Recip) of LICENSING.Even though our interpretation of senses complies with the original model, we opted for different notations.