Sinology Resources Wiki

What are the special characteristics of Chinese and bilingual Chinese-English dictionaries? []

Without abetting the myth of radical (linguistic) exoticism, there are nevertheless a number of attributes in which Chinese lexicography differs radically from that used for English and other languages based on phonetic scripts.

Lexica organized around a logographic, non-alphabetic script like Chinese will usually allow for searches based on the graphic form(s) of a character (graph or grapheme), or by transcribed pronunciation (in Chinese, this is commonly Mandarin putonghua pinyin 普通話拼音, though some dictionaries now also include pronunciations from other dialects), normally via Romanized input; tone indicators can often also be added to reduce the number of homonyms returned when using phonetic search criteria. While there are several robust voice recognition applications available we are not aware of many dictionaries that have incorporated this feature yet (Pleco being one of the few examples).

Most graphs can be searched for using either simplified or traditional forms, and sometimes variant or paleographic forms, as well as graphic subcomponents. These search systems mainly use the traditional semantic determiners called “radicals” (bushou 部首) which vary in number depending upon the system used by the lexicographer, and have been the most common organizational system in Chinese over the past two millennia. In terms of etymology, their importance has slowly diminished, since some follow semantic considerations while others are phonetically motivated, or are on occasion completely arbitrary. Newer electronic dictionaries thus also allow for search based on components that are not “radicals” but follow the same compositional rules: subcomponent or initial stroke type, followed by the number of additional strokes. It isn't the case that non-radical based graphic lookup methods only appeared with digital dictionaries; in print we can sometimes search for a graph by its full number of strokes, or use the “four corner index”, but these are less efficient since the amount of characters that must be searched visually by the reader becomes quite large. 

If we include graphic input of encoded characters, drawn, scanned or photographed images of characters, Romanized or otherwise written pronunciation and audio-representation in recognition or production, one can easily employ up to four or more scopes, any combination of which can be used as primary organizing feature and search lemmata.  

In cases where the user is faced with a written Chinese character of which neither meaning nor pronunciation are known, s/he has the following options: a) find the primary radical or other component and count its strokes, then look up the radical and count the additional strokes, or b) paint or otherwise capture the character and use Optical Character Recognition (OCR) software. In this case, we see that older solutions required that characters be drawn following the traditional stroke order while most newer algorithms allow for any order as long as the result matches the desired form.  S/he may also be c) interested in the correct order of strokes for the character, so many dictionaries display stroke order animations. Closely related to this function are “translations” of standard graphic forms into fonts resembling handwritten script forms, intended to help the learner become familiar with these widely-used scribal variants.   

In cases where the user hears a Chinese word and wants to look up its meaning or graphic form, s/he has the following options: a) enter the pinyin or appropriate alphabetic transcription and search through a long list, and b) reduce the list by adding the presumed tone. In this part of search, which is usually taken over by dictionary-independent input methods (IME), s/he can quite naturally as in spoken conversation also c) add more context to reduce ambiguity of the input string, which most IMEs facilitate natively based on usage frequency (known as n-grams). As a review of the technology employed by IMEs would require an article of its own, in brief let us say that aside from alphabet-based input of transcripted phonemes there exist several common methods which map PC or mobile phone keyboards to graphic compounds without any connection to phonetic values whatsoever. These methods are generally more difficult to learn, but eventually allow for significantly faster input. Another obvious advantage of stroke-based input is that users are then less prone to losing the ability to handwrite characters, a problem which is increasingly being perceived as affecting literacy in general, and not just by opponents of computerization.  

If we combine the factors of the user’s native tongue, the need to resolve written or spoken language, the desire to receive or produce speech, the intention to learn the language or translate efficiently and the direction of translation we quickly find there are an exponential number of ways to configure an adequate search result. At first, digital dictionaries simply reproduced the structure of their paper precursors, only later did they begin to make better use of the new ranges of function. In theory, one and the same digital dictionary could be customizable to best meet the needs of any type of user and even language domain, strata, usage by period, gender or geographical varieties could be incorporated simply by using corresponding techniques and intelligently ordering the search results. As yet, none of the electronic solutions we have reviewed make adequate use of these possibilities, so in search of real variety, at this point we must still rely upon print works.