A word is a unit of language that carries meaning and consists of one or more morphemes which are linked more or less tightly together, and has a phonetic value. Typically a word will consist of a root or stem and zero or more affixes. Words can be combined to create phrases, clauses, and sentences. A word consisting of two or more stems joined together form a compound. A word combined with another word or part of a word form a portmanteau.
Depending on the language, words can be difficult to identify or delimit. Dictionaries take upon themselves the task of categorizing a language's lexicon into lemmas. These can be taken as an indication of what constitutes a "word" in the opinion of the authors.
In spoken language, the distinction of individual words is usually given by rhythm or accent, but short words are often run together. See clitic for phonologically dependent words. Spoken French has some of the features of a polysynthetic language: il y est allé ("He went there") is pronounced /i.ljɛ.ta.le/. Since the majority of the world's languages are not written, the scientific determination of word boundaries becomes important.
There are five ways to determine where the word boundaries of spoken language should be placed:
As Plag suggests, the idea of a lexical item being considered a word should also adjust to pragmatic criteria. The word "hello", for example, does not exist outside of the realm of greetings being difficult to assign a meaning out of it. This is a little more complex if we consider "how do you do?": is it a word, a phrase, or an idiom? In practice, linguists apply a mixture of all these methods to determine the word boundaries of any given sentence. Even with the careful application of these methods, the exact definition of a word is often still very elusive.
There are some words that seem very general, but may truly have a technical definition, such as the word "soon," usually meaning within a week.
In languages with a literary tradition, there is interrelation between orthography and the question of what is considered a single word. Word separators (typically space marks) are common in modern orthography of languages using alphabetic scripts, but these are (excepting isolated precedents) a modern development (see also history of writing).
Vietnamese orthography, although using the Latin alphabet, delimits monosyllabic morphemes, not words. Conversely, synthetic languages often combine many lexical morphemes into single words, making it difficult to boil them down to the traditional sense of words found more easily in analytic languages; this is especially difficult for polysynthetic languages such as Inuktitut and Ubykh, where entire sentences may consist of single such words.
Logographic scripts use single signs (characters) to express a word. Most de facto existing scripts are however partly logographic, and combine logographic with phonetic signs. The most widespread logographic script in modern use is the Chinese script. While the Chinese script has some true logographs, the largest class of characters used in modern Chinese (some 90%) are so-called pictophonetic compounds (形声字, Xíngshēngzì). Characters of this sort are composed of two parts: a pictograph, which suggests the general meaning of the character, and a phonetic part, which is derived from a character pronounced in the same way as the word the new character represents. In this sense, the character for most Chinese words consists of a determiner and a syllabogram, similar to the approach used by cuneiform script and Egyptian hieroglyphs.
There is a tendency informed by orthography to identify a single Chinese character as corresponding to a single word in the Chinese language, parallel to the tendency to identify the letters between two space marks as a single word in the English language. In both cases, this leads to the identification of compound members as individual words, while e.g. in German orthography, compound members are not separated by space marks and the tendency is thus to identify the entire compound as a single word. Compare e.g. English capital city with German Hauptstadt and Chinese 首都 (lit. chief metropolis): all three are equivalent compounds, in the English case consisting of "two words" separated by a space mark, in the German case written as a "single word" without space mark, and in the Chinese case consisting of two logographic characters.
In synthetic languages, a single word stem (for example, love) may have a number of different forms (for example, loves, loving, and loved). However, these are not usually considered to be different words, but different forms of the same word. In these languages, words may be considered to be constructed from a number of morphemes. In Indo-European languages in particular, the morphemes distinguished are
Thus, the Proto-Indo-European *wr̥dhom would be analysed as consisting of
In Indian grammatical tradition, Panini introduced a similar fundamental classification into a nominal (nāma, suP) and a verbal (ākhyāta, tiN) class, based on the set of desinences taken by the word.
Published - November 2008
Information from Wikipedia is available under the terms of the GNU Free Documentation License