Language, communication among human beings that is characterized by the use of arbitrary spoken or written symbols with agreed-upon meanings. More broadly, language may be defined as communication in general; it is regarded by some linguists as a form of knowledge, that is, of thought or cognition.
II APPROACHES TO LANGUAGE
Language can be studied from at least two points of view: its use or its structure. Language use is the concern of scholars in many fields, among them linguistics (in particular sociolinguistics), literature, communications, speech and rhetoric, sociology, political science, and psychology. Examined in studies of language use are what humans say, what they say they are thinking, and what they mean by what they write and say to one another. Included are content analysis and criticism of literature, studies of the history and changes of meaning of words, and descriptions of the social factors that determine what appropriate speech behaviour is. The fields of speech and rhetoric include studies of the ways in which language can influence behaviour. For literary specialists, language consists of words arranged to produce a logical or harmonious effect. For lexicographers, it is an inventory of vocabulary, including the meanings, origins, and histories of words. Language is also the particular way words are selected and combined that is characteristic of an individual, a group, or a literary genre.
Language structure is the concern of linguistics. Within the field of linguistics, the definitions of language vary, and linguists differ in approach according to the definition they use. Those who study language as written communication are interested in the structure of what they call “text”—how sentences and their parts are organized into coherent wholes—and concerned with how one language can be accurately translated into another. In the field of machine translation, computers handle the vast amount of data needed for such studies. Comparative linguists seek to identify families of languages descended from a common ancestor.
Structural and descriptive linguists view spoken language as having a hierarchical structure of three levels: sounds, sound combinations (such as words), and word combinations (sentences). At the phonemic level, sounds are analysed; at the morphemic level, the combination of sounds into meaningful units of speech (morphemes, that is, word-building units) is described; and at the syntactic level, the combination of words in sentences and clauses is the focus. See also Morphology; Phonetics; Phonology; Semantics.
Transformational-generative grammarians are linguists who define language as knowledge. They study both the nature of the human capacity to acquire language and the language acquisition process in their quest to describe the grammar of a language or languages.
III ANIMAL AND HUMAN COMMUNICATION
The study of language as a means of expression or communication necessarily includes the study of gestures and sounds. Considering that animals gesture and make sounds, do animals, as well as humans, have language? It seems clear that many species communicate; human as distinct from animal communication, however, has been characterized by some scholars as unique in having the following seven features: (1) Human languages have separate, interrelated systems of grammar and of sound and gesture. (2) They allow new things to be communicated all the time. (3) Humans make a distinction between the content that is communicated and their labels for that content. (4) In human communication, spoken language is interchangeable with language that is heard. (5) Human languages are used for special purposes; intent lies behind what is communicated. (6) What is communicated can refer to the past and the future, not just the present. (7) Humans are born with the ability to learn and then teach, any language, unlike some animals whose communication systems are with them from birth.
Some convincing recent research in teaching American Sign Language (AMESLAN) to primates, and other experiments where chimpanzees used computers and voice synthesizers to produce basic sentences, indicate, however, that a number of these features may not be uniquely human. Nonetheless, it seems safe to say that although language as a system of communication is not uniquely human, human language nevertheless has unique characteristics. Humans string together discrete signs and units of grammar to form an infinite set of never-before-heard, thought, read, or signed sentences. The linguist Noam Chomsky introduced the idea that children are born with an innate knowledge of complex grammatical rules (universal grammar) that they apply to the language they are exposed to. See also Animal Behaviour.
IV ESSENTIALS FOR SPEECH
For human language to be possible, certain factors are necessary. These factors are physiological (the body must be capable of producing the sounds of speech), grammatical (the speech must have structure); and semantic (the mind must be capable of dealing with the meanings of what is spoken).
Although most of the human organs of speech evolved primarily to perform other functions (such as eating), they are so well equipped for speaking that human speech appears to be the most efficient communication system of any living organism. In the speech, an airstream is produced by the lungs and modulated by vibration (or lack of vibration) of the vocal cords and by movement of the tongue, the soft palate, and the lips. The airstream can be obstructed in varying degrees by the teeth and can be closed off from or kept open to the nasal cavity. People who have physiological impairments of speech and hearing still possess language, although the production and reception of language may have been transferred to visual systems such as AMESLAN.
All human language has a grammatical structure whereby sound (signalling) units are combined to produce meaning. The minimal units of sound combination that have meaning are called morphemes. A morpheme can correspond to a word, but it can also refer to other sound combinations that have meaning but are not words (such as prefixes and suffixes). In the word coexist, for example, both co and exist are morphemes. Words and morphemes can be classed together according to what they do in a sentence. Classes of morphemes include parts of speech (such as nouns and verbs) as well as prefixes, suffixes, and so forth. Members of different word classes from phrases that in turn combine into larger units—sentences or utterances.
Finally, in human language, the speaker necessarily attaches meaning to the structured sound sequences, and the meaning is perceived and understood by other humans who share the same language. The process of communicating meanings with sounds, words, and sentences and perceiving meanings that others communicate in this way is believed to involve grammar as a tool for relating thoughts or ideas to speech, or signalling. Every meaningful sentence or utterance has a surface and an underlying structure. At the surface are the words and sentence elements as spoken and interpreted. At the underlying or deep level are the words and sentence elements as they are grammatically structured. This level of deep structure is where sentence structure appears ambiguous. Two different surface structures can be perceived to mean one thing, and one surface structure of a sentence may have two meanings. The surface sentence “Flying planes can be dangerous” means both that it can be dangerous for someone to fly planes and that planes that are flying can be dangerous. The different interpretations of this sentence have to do with its common surface structure having two distinct deep structures. On the other hand, “To please John is easy” and “It is easy to please John”, despite different surface structures, are the same sentence at the level of deep structure. Human communication is a unique process combining special speech organs, grammatical structure, and intended and understood meanings.
V LANGUAGES OF THE WORLD
Spoken-gestured-signalled communication involves the same process for all humans, and any human language can convey any human thought; nevertheless, the actual languages spoken in the world are numerous, and they differ vastly in their sound systems and grammatical structures.
A Classification by Form
Languages can be classified by the form of their grammar. Beginning in the 19th century, linguists attempted to group the world’s languages into four morphological, or typological, categories (that is, categories based on how words are formed): analytic, agglutinating, inflectional (or synthetic), and incorporating (or polysynthetic). Analytic languages typically have words of one syllable with no affixes, or added parts; words are on their own, isolated, as in Chinese. In agglutinating (from the Latin for “to glue to”) languages, words are composed of roots, or basic parts, and one or more affixes (prefixes at the beginning, infixes in the middle, and suffixes at the end of words) with distinct meanings. An example is Turkish, which has äv (“house”), ävdä (“in the house”), ävlär (“houses”), and ävlärda (“in the houses”). In inflectional languages, the basic and added parts have merged, and the added parts have no independent meaning. For example, in Latin, which is inflectional, the subject’s person and number are reflected in the form of the verb, as in fero (“I bear”), ferimus (“we bear”), and ferent (“they bear”). Swahili is also agglutinating, as in hatukuviwanunulia, which means “We did not buy them (= things) for them (= people)”. The components of this word are ha (negative), tu (“we”), ku (indicator of past), vi (“them”, meaning “objects”), wa (“them”, meaning “people”), and nunulia (“buy for”). Polysynthetic languages have very long, complex words that have a mixture of agglutinating and synthetic features. Examples include Eskimo languages and Mohawk, all Native American languages.
B Genetic Classification
Even though two languages form words or organize sentence elements, in the same way, they are not necessarily related to each other. To establish relationships among languages is to study their genealogy and classify them genetically. Unlike a typological classification, a genetic classification involves comparing the sound and meaning units of different languages in order to show common parentage. Like family resemblances among people, shared genetic resemblances among related languages do not depend on where the languages are spoken or when they existed.
Members of a language family have a historical connection with one another and descend in common from a single ancestor. Language family trees show the relationships among languages; the oldest traceable ancestor language is shown at the top of the tree, and the bottom branches show the distance of relationship among current living members of the family. Related languages are alike in that their grammatical elements and vocabulary show regular correspondences in both sound and meaning. For example, the English word fish corresponds to Latin piscis, and English father to Latin pater. The English and Latin words are cognates, that is, genetically the same. Where English has f, Latin has p; English th corresponds to Latin t; and so forth. Comparative linguistics is the field in which sound and meaning correspondences (that is, cognates) among languages are analysed; genetic groups of languages are established; and by comparing modern languages, the hypothetical ancestor languages of such groups are tentatively reconstructed. (Such reconstructed precursor languages are indicated by the term proto-, as in Proto-Indo-European.)
B1 European and Asian Families
The best-known language family is the Indo-European family, which represents about 1.6 billion people and includes most of the languages of Europe and northern India and several languages of the region in between. Indo-European has the following subfamilies: Italic, Germanic, Celtic, Greek, Baltic, Slavic, Armenian, Albanian, Indo-Iranian, and the extinct Anatolian, Phrygian, Thracian, Tocharian, and Unclassified Indo-European. Further sub-classifications exist within subfamilies. English, for example, belongs to the Anglo-Frisian group of the West Germanic branch of the Germanic subfamily. The closest relative of English is Frisian, which is spoken today in parts of Germany and the Netherlands. The relationship of English to other Indo-European languages such as Swedish (North Germanic), Latin (Italic), and Sanskrit (Indo-Iranian) is progressively more distant.
The Indo-European family is only one of over a hundred families and proposed larger groupings. Linguists differ in their approach to classification, and what a conservative scholar may term a family, another more liberal scholar may consider a subfamily within a larger grouping. Conservative scholars, on the other hand, may consider that too little evidence exists to support such larger groupings.
Other languages are present in Europe besides the Indo-European family. Basque is an isolate or a language with no known relatives, and Finnish, Estonian, Saami, and Hungarian are the westernmost members of the Finno-Ugric branch of the Uralic family (which also includes various languages of the Ural Mountains region and Siberia). Occasionally linked with the Uralic languages in a Ural-Altaic group (a relationship now rejected by most scholars) is the Altaic family, the main branches of which are Turkic, Mongolian, and Manchu-Tungus. Several unrelated language groups of Siberia are referred to by the regional name Palaeosiberian languages. In the Caucasus three groups, possibly related, have been identified; the best known of the Caucasian languages is Georgian. Many languages of India and its north-west neighbours belong to the Indo-Iranian branch of Indo-European. Two other groups—the Munda languages, usually considered a branch of the Austro-Asiatic languages, and the Dravidian family—represent more than 80 million speakers (see Indian Languages). In South East Asia the Sino-Tibetan languages have hundreds of millions of speakers. The family’s principal branches are the Tibeto-Burman and the Sinitic (which includes the many Chinese “dialects”, really separate languages). Some scholars attach the Tai-Kadai languages (including Thai or Siamese) to this family; most consider them of separate ancestry.
B2 Pacific and African Languages
In the Pacific, the three main language groups are, first, the Austronesian languages, a family that has a Western or Indonesian branch and an Eastern or Oceanic branch; second, the Papuan languages, a regional group of New Guinea, comprising a number of isolates and families (some possibly related to one another); and third, the Australian Aboriginal languages (related to one another but not to non-Australian languages). The extinct Tasmanian language may represent the fourth group.
Africa has four language families, the largest of which (with over 1,400 languages) is Niger-Congo. The Niger-Congo family includes branches such as Mande, Atlantic, Gur, and Benue-Congo, which includes Africa’s most widespread group, the Bantu languages (such as Swahili and Zulu), and Yoruba (of the Benue-Congo sub-set), one of the most widely spoken languages in Africa (along with Swahili and Hausa). The around 400 languages of the Afro-Asiatic family originate from parts of Africa and Asia. The family’s branches include Semitic, including Arabic and Hebrew (see Semitic Languages); Chadic, including Hausa; Berber; Cushitic; Omotic; and Egyptian, of which the sole member is the extinct Egyptian language. In the Nilo-Saharan family, which is made up of around 200 languages and several sub-groups, the principal subdivision is East Sudanic; its Nilotic branch includes such languages as Maasai, while the Saharan branch of the Nilo-Saharan languages includes Kanuri. The Khoisan family comprises around 30 Click languages of the San and other peoples of southern Africa. See also African Languages.
B3 Native American Languages
Attempts to classify Native American languages have resulted in the conservative identification of more than 150 families. In liberal classifications, these are further grouped into about a dozen so-called superstocks (superfamilies, or groups of families that may be remotely related), but recent studies have challenged such groupings. Even with a liberal approach, many small families remain unlinked to larger groups, and many isolates exist. Along the Arctic coast and in Greenland, Inupiaq (Eskimo-Aleut family) is spoken by the Inuit (Eskimo). Subarctic Canada includes various Athabascan and Algonquian languages. Native American languages in the United States east of the Mississippi River are predominantly Algonquian, Iroquoian, and Muskogean. The principal Great Plains family is the Siouan, but Caddoan and western Algonquian languages are also spoken. Shoshonean languages (Uto-Aztecan family) dominate the Great Basin, bordered on the north by the Sahaptian family. On the Northwest Coast are the Salishan and Wakashan families, Tlingit (thought to be related to Athabascan languages), and a probable isolate, Haida. The Apachean branch of Athabascan is found throughout the Southwest, alongside the Yuman family and the Pima-Papago language (Uto-Aztecan) of Arizona and southern California. In California many small families exist, their mutual relationships often disputed.
Important in Mexico and Central America are the Uto-Aztecan family (Aztec or Nahuatl), the Oto-Manguean superstock (Mixtec, Otomí, and Zapotec, among others), and families such as Mixe-Zoquean, Totonacan, and Tequistlatecan. The Mayan family comprises about two dozen languages with millions of speakers.
Depending on their approach, linguists classify South American languages into 90-odd families and isolates or into three nearly all-inclusive superstocks; Macro-Chibchan, Andean-Equatorial, and Gê-Pano-Carib. The most widely spoken native South American languages are Quechua and Aymara, Guaraní, and Mapuche or Araucanian. Important in Central America and northern South America are Macro-Chibchan languages (such as Guaymí, Paez, and Warao) and also the large Arawakan group (including Island or Black Carib, Guajiro, and Campa). The widely accepted Macro-Gê superstock includes many languages spoken in the Brazilian tropics.
C Areal Classification
The geographical, or a real, classification of languages is also useful. Areal classification is based on the observation of the ways in which neighbouring languages have influenced one another. In discussions of the Northwest Coast languages of North America, for example, statements are often made that these languages share various consonants of a certain type, or that they all have a large fishing-related vocabulary. Underlying such statements is an assumption that the similarities exist because, over time, these languages have borrowed grammar, sounds, and vocabulary from one another. Such regional resemblances, however, do not necessarily indicate either the genetic relationship or typological similarity.
D Written and Spoken Language
When individual languages have a written as well as a spoken form, it is often the case that the writing system does not represent all the distinctive sounds of the language. The writing system of one language may make use of symbols from the writing system of another language, applying them to sounds, syllables, or morphemes for which they were not originally intended. Written and spoken forms of the same language can be compared by studying the “fit” between the writing system and the spoken language.
Many kinds of writing systems exist. In Chinese, a written character is used for every morpheme. The written form of the Cherokee language has a symbol for every consonant-and-vowel syllable. Japanese is also written with such a system, which is called a syllabary. In writing systems using an alphabet, such as the Latin alphabet, each symbol theoretically stands for a sound in the spoken language. The Latin alphabet has 26 letters, and languages are written with it generally use all 26, whether their spoken form has more or fewer sounds. Although it is used for written English, the Latin alphabet does not have symbols for all the sounds of English. For example, for some sounds, combinations of two letters (digraphs), such as th, are used. Even so, the combination th does not indicate the spoken distinction between th in “thin” and th in “this”.
The written form of a language is static, unchanging, reflecting the form of the language at the time the alphabet, syllabary, or character system was adopted. The spoken form is dynamic, always changing; eventually, the written and spoken forms may no longer coincide. One of the problems with the English written language is that it still represents the pronunciation of the language several centuries ago. The word light, for example, is today pronounced “lite”; the spelling “light” reflects the former pronunciation. In languages with writing systems that have been recently developed (such as Swahili) or reformed (such as Hebrew), the written and spoken forms are more likely to fit.
Unlike speech, writing may ignore pitch and stress, omit vowels, or include punctuation and capitalization. The written and spoken forms of a language also differ in that writing does not incorporate spoken dialect differences. Speakers of mutually unintelligible Chinese languages or dialects, for example, can read one another’s writing even though they cannot communicate through speech. Similarly, speakers of the different German dialects all write in High German, the accepted standard form of the language. See also Writing.
E Standard and Non-standard Language
The written form of a language may have more prestige than the spoken form, and it also may have a more complex grammar and a distinctive vocabulary. A standard written literary language thus tends to influence the speech of educated people. In certain circumstances, they will try to imitate it when they talk, and they may relegate the unwritten form to situations where prestige is less important. In Arabic-speaking countries, for example, educated people sometimes use classical Arabic in speech as well as in writing, whereas uneducated people speak only colloquial Arabic. The use of two such varieties of a single language by the same speaker in different situations is called diglossia. People who use the spoken form of a standard literary dialect in public and their native regional dialect when they are with friends (as do much German-speaking Swiss) are said to be diglossic.
A standard language is that one of the language’s dialects that have become dominant. Often, such dominance is due to the governmental policy whereby one dialect is given prestige over others, and various regulations or customs ensure that it is used. A standard variety is not in any way inherently superior to other dialects and is, itself, just another dialect with its own individual grammar, vocabulary, and accent (while the standard can—and is—spoken in many accents, there is usually one accent that is held as more prestigious than others, as in Received Pronunciation in the UK). The standard language (such as High German) is frequently the dialect used in writing; that is, it is the literary language of a speech community or at least a dialect that has an existing orthography and a body of material written in it.
Few people actually speak such a standard language; rather, they approximate it with their own regional variations. The standard dialect is the one that is used when a language is taught to non-native speakers; the learners then speak it but do so with an accent, or variation carried over from their first language and region. The standard language also provides a common means of communication among speakers of regional dialects (as in the examples given above for German). Standard languages are thus highly useful in efforts to unite people and create a sense of national spirit.
F Dialect, Argot, and Jargon
A dialect is a variety of a language that differs consistently from other varieties of the same language used in different geographical areas or by different social groups. For example, Boston residents who speak the New England dialect of American English drink tonic and frappés, whereas people in Los Angeles sip sodas and milkshakes. Within groups of people who speak the same geographical or social dialect, other language variations exist that depend on specific situations. People who have activities in common or share a profession or trade may have a special “language” called an argot that identifies them as distinct from outsiders. Teenagers, thieves, and prostitutes have an argot separating them from parents, police, and other authorities. Such a specialized informal argot is called slang. An argot or specialized terminology, as shared by members of a profession, without any connotation of slang, is called a jargon. Professional groups with distinct jargons include doctors, lawyers, clergy, linguists, and art critics. (The use of the terms argot, jargon, and slang, however, varies somewhat from writer to writer).
G Pidgins and Creoles
Just as a language may develop varieties in the form of dialects and argots, languages as a whole may change (Latin, for example, evolved into the different Romance languages). Sometimes rapid language change occurs as a result of contact between people who each speak a different language. In such circumstances, a new language called a pidgin may arise (see Pidgins and Creoles). Pidgins are based on one language from which they take much of their vocabulary but are also influenced by at least one other language; they have relatively small sound systems, reduced vocabularies, and simplified and altered grammars and they rely heavily on the context in order to convey meaning. Pidgins are often the result of contact by traders with island and coastal peoples. A pidgin has no native speakers; when speakers of a pidgin have children who learn the pidgin as their first language, that language is then called a creole. Once the creole has enough native speakers to form a speech community, the creole may expand into a fuller language. This is the case with Krio, a language with many speakers in Sierra Leone in West Africa. Krio arose from what was originally an English-based pidgin and has influences from Yoruba among other languages.
H International Languages
In the midst of world linguistic diversity, a number of international languages have been proposed as a means for solving world problems thought to be caused by misunderstandings of communication. Sometimes, existing natural languages are advanced to fill this role. These so-called LWCs (Languages of Wider Communication)—such as English or French, already spoken by many people as a second language as well as by many native speakers—have proponents who hold that everyone should know one or the other. More often, efforts have been made to construct artificial languages for everyone to learn. A number of artificial languages have enjoyed a period of vogue, then fallen into disuse. One artificial language, Esperanto, has had a relatively high success rate because it has a regular grammar, an “easy” pronunciation, and a vocabulary based on Latin and ancient Greek and on the Romance and Germanic languages. To speakers of languages of other families, however, Esperanto seems less international and is harder to speak and learn. One new language proposed for international use is LOGLAN (standing for logical language), a laboratory-created language that is claimed to be culture-free and to allow people to speak their thoughts clearly and unambiguously. It has a small sound system and few grammatical rules, and its vocabulary is drawn from the eight most widely spoken languages in the world today, including Hindi, Japanese, Chinese, as well as Russian and other Indo-European languages.
Even if a perfect international language could be devised and adopted, however, it is by no means certain that it could minimize global communication problems. Moreover, the thought processes that relate languages to the ideas that people express with them are still not understood. Even if everyone did learn Esperanto or LOGLAN and used these languages in international or public dealings, it is probable that processes of language change would soon take over. The world would then have dialects of Esperanto or some other international language, leading eventually to even further differentiation or to pidginization, creolization, and so forth. Indeed, English and French in different parts of the world have already become differentiated; Indian English, for example, is different both from American English and from British English.
VI LANGUAGE DEVELOPMENT, CHANGE, AND GROWTH
Defined as the production and perception of speech, language evolved as the human species evolved. As a communication system, it can be related to the communication systems of other animals. As discussed above, however, human language has a creative and interpretive aspect that appears to mark it as distinctive. The understanding of human speech is believed to involve specialization of part of the left hemisphere of the brain (Broca’s area). It is possible that human language may not have been distinct from animal communication until this physiological specialization occurred. The production of human language is believed to have occurred first in Neanderthals (100,000-30,000 years ago); it is speculated that about 40,000 to 30,000 years ago the emergence of modern Homo sapiens (with skull and vocal tract possibly better specialized for speech) was accompanied by significant linguistic development. Modern human language, then, may be only 30,000 to 40,000 years old. The immense diversity of languages spoken in the world indicates an incredible acceleration in the rate of change of human language, once it emerged. If there was, in fact, a first language, its sounds, grammar, and vocabulary cannot be definitely known. Historical linguists, focusing on finding out and describing how, why, and in what form languages occur, can only suggest hypotheses account for language change.
In the 18th century, the German philosopher Gottfried Wilhelm Leibniz suggested that all ancient and modern languages diverged from a single proto-language. This idea is referred to as monogenesis. Most scholars believe that such a language can, at best, be posited only as a set of hypothetical formulas from which one can derive the world’s languages, and according to which they can be related to one another; it is unlikely that such a reconstruction reflects a real first language as it was actually spoken. Although many modern languages do derive from a single ancestor, it is also possible that human language arose simultaneously at many different places on Earth and that today’s languages do not have a single common ancestor. The theory that present language families derive from many original languages is called polygenesis.
Whether human language was ultimately monogenetic or polygenetic, the differences among languages are believed to be relatively superficial. Although many humans find it difficult to learn a second language (as opposed to acquiring a second language before puberty), and although languages such as Chinese, English, and Swahili may seem to have little in common, the differences among languages are not nearly so great as the similarities among them. The sounds and sound combinations of the world’s languages, despite the ways in which they differ from language to language, are believed to have been selected from a universal set of possible sounds and sound combinations available to all human languages. Human languages likewise have individual structural properties that are selected from a common pool of possible structures. That is, no human language utilizes any sounds that cannot be produced by any human being or any grammatical categories that cannot be learned by any human—whether or not the native language of a given person makes use of those sounds and structures. The range of possible language changes, in other words, seems to be limited by the universal structural properties of language.
When a language undergoes substantial changes both in vocabulary and in sound and structure, the whole language may become another language. This occurs in pidginization and creolization and also, for example, in the development of the modern Romance languages from Latin. Language growth can also occur when a minor dialect becomes dominant and breaks off from other dialects. Eventually, the dialect that is split off ceases to be mutually intelligible with the other dialects; it may develop new dialects of its own, become subject to pidginization and creolization, and so forth, over and over again through time. This continual growth and development characterize language in all its aspects as a living expression of both human nature and culture.