Linguistic aspects of Library and Information Science (LIS)
Library and information work is pervaded by language in many ways, although the
formal contact between LIS and linguistics seems to be weak (cf.,
Warner, 1991).
Any theory of LIS or approach within LIS must in some way incorporate theoretical views involving language. Just as different epistemologies have influenced LIS have the same epistemologies also influenced linguistics. For example may the facet-analytic approach within LIS be understood as a influence of a rationalist epistemology. The same rationalist epistemology has also influenced linguistics (cf., Malmkjær, 1995c) and semantic primitives may be understood as a linguist theory that in the basic assumptions correspond very much to the facet-analytic approach in LIS.
For any theoretical approach to LIS must corresponding theoretical approaches to language exist (and vice versa). This is also the case with psycho-sociological theories of human beings ("users"): different views of human beings have corresponding theories about language. When cognitive science developed from about 1956, the linguist Noam Chomsky was one of the leading figures. His view was also rationalist and he and cognitive science influenced both linguistics, psychology, LIS and many other fields (cf., cognitive science). The information scientist Gerald Salton expressed pessimism concerning the usefulness of linguistics in information science. In the words of the Danish linguist and information scientist Henning Spang-Hanssen:
"In this connection it is important to realize that the points of view, which have been domination within linguistics in the last 10-15 years, in particular in the USA (i.e. Noam Chomsky's school of generative grammar) not has had practical influence worth mentioning in relation to natural language processing." In its theoretical foundation and in the technicalities (such as the writing of rules in algorithmic form) exist important similarities between generative grammar and electronic data processing. Natural language processing seems, however, in practice still to depend on traditional categories of grammar and traditionally formed dictionaries. This demonstrates in my opinion the the problems related to automation of text - as opposed to problems related to automation of mathematical computations, are fundamental and thus cannot be eliminated just by computer-oriented versions of linguistics.
I thus share with Gerald Salton his pessimism about the usefulness of recent linguistics in relation to automated documentation. However, Salton seems to identify linguistics with modern American linguistics and thus to miss the knowledge, which was gained before generative grammar evolved or which was gained in other countries such as Scandinavia".(Spang-Hanssen, 1974, 17, translated by BH).
"The so-called pragmatic approach in modern linguistics tries to expand the understanding of linguistic activity' to include all issues, which may influence the use of language (e.g. what determines that we chooses to speak rather than to be silent, the choice of people with whom we communicate, attitudes towards the listener, imaginations of the listeners knowledge and wants). I believe it is too early to evaluate the potentials of this approach for describing issues like subject indexing as linguistic activities. " (Spang-Hanssen, 1974, 47, translated by BH).
In order to understand the relation between linguistics and LIS it is thus important to understand that both fields are influenced by changing epistemological views and interdisciplinary trends. Epistemology is simply a deeper way to understand both fields. Different approaches to the understanding of language include:
Forms of structuralism (Ferdinand de Saussure, Louis Hjelmslev, Henning Spang-Hanssen)
Generative grammar (Noam Chomsky's school)
Ludwig Wittgenstein's view of language-games (the late Wittgenstein)
Pragmatic views forwarded by, among others, John Dewey
The situation is thus rather complex, and a conclusion seems to be that information scientists need to form broader theories that includes not just language, but also knowledge, cognition and social organization.
Different approaches and epistemologies may be grouped to broad "families" sharing related views. A basic distinction may be made between views related to logical positivism on the one hand, and on the other hand views related to pragmatism:
Peregrin (2004) suggests that the two main paradigms in semantics are the one developed by logical positivists such as Rudolph Carnap (and the young Wittgenstein) on the one hand and the one developed by pragmatic philosophers such as John Dewey (and related to, among others, the late Wittgenstein) on the other hand. The positivist semantics suggests that expressions 'stand for' entities and their meanings are the entities stood for by them. The pragmatic semantics suggests that expressions are tools for interaction and their meanings are their functions within the interaction, their aptitudes to serve it in their distinctive ways. Another difference is that positivism tend to look at language as a neutral medium, that does not influence what speakers are communicating, how messages is transmitted or how messages are understood. The alternative view sees linguistic communities as sharing forms of preunderstanding and thus the processing of language. This dichotomy may be used in the general field of linguistics as well as in other human sciences. It was also used by Hjørland & Nissen Pedersen (2005) about the foundation of a theory of classification for information retrieval (click for summary of arguments).
In an article in Human IT Peter Gärdenfors (1999) noted that
Such an alternative view on cognitive processes and language emphasizing culture and action a long time existed before cognitive science was born as a movement. Persons such as John Dewey and S. L. Vygotsky were early spokesmen for such a view in psychology and in linguistics similar views were expressed by, for example, M. M. Bakhtin, M. A. K. Halliday and R. Rommetweit. All these researchers can be seen as an early alternative conception compared to the basic ideas in cognitive science and rationalist approaches to linguistics.
Blair claims that his book on Wittgenstein and information (2006) "by using this theory it creates a firm foundation for future Information Retrieval research.". This claim is, however, problematized by Hjørland's review (2007). Wittgenstein may be important, but Blair fails to provide convincing arguments for a foundation for information studies.
Liddy (2003) discusses the following "levels" of natural language processing (NLP):
|
Pragmatic Liddys model (2003) of Natural Language Processing
|
Phonetics
The interpretation of speech sounds within and across words. Phonetic knowledge is used, for example, for building speech recognizing systems.
Morphology
Deals with the componential nature of
words, which are composed of morphemes (the smallest units of
meaning). Morphological knowledge is used, for example, for automatic
stemming, truncation or masking of
words.
Lexicology
Is the study of words. (Related to
lexicography, the study of dictionaries). The
(mental) lexicon organizes the mental vocabulary in a speaker's mind. Lexicology
is normally seen as a part of semantics, but is especially in French and German
linguistics seen as an independent subdiscipline.
Syntax (or grammar)
syntax is the study of the rules, or "patterned relations", that govern the way the words in a sentence are arranged. Syntactic rules are used in parsing algorithms. The software Connexor was used, for example, by Schneider & Borlund (2004) to identify noun phrases related to bibliographical references. http://www.connexor.com/demo/syntax/
Is the study of meaning. The study of the meaning of isolated words may be termed lexical semantics. The study of meaning is also related to, for example, syntax at the level of the sentence and to discourse at the level of texts.
"Although syntax and semantics work with sentence-length units, the discourse level of NLP works with units of text longer than a sentence." (Liddy, 2003, p. 2130). Examples from information science are the resolving of anaphora and ellipsis and the examination of the effect on proximity searching. (Discourse analysis is a term which is also used in many other meanings).
Pragmatics (Pragmatics should not be confused with pragmatism in philosophy)
Pragmatics is often understood as the study of how the context (or "world knowledge") influences meaning. Liddy (2003, p. 2130) provides the following example:
The city councilors refused the demonstrators a permit because they feared violence.
The city councilors refused the demonstrators a permit because they advocated revolution.
The word pen can have at least two meanings (a container for animals or children, and a writing implement). In the sentence The box was in the pen one knows that only the first meaning is plausible; the second meaning is excluded by one's knowledge of the normal sizes of (writing) pens and boxes. This example demonstrates that it is easy for people to choose the right sense of the word, but it is extremely difficult to program a computer with all the world knowledge necessary to do the same.
"The above levels of linguistic processing reflect an increasing size of unit of analysis as well as increasing complexity and difficulty as we move from top to bottom. The larger the unit of analysis becomes (i.e., from morpheme to word to sentence to paragraph to full document), the less precise the language phenomena and the greater the free choice and variability. This decrease in precision results in fewer discernible rules and more reliance on less predictable regularities as one moves from the lowest to the highest levels. Additionally, higher levels presume reliance on the lower levels of language understanding, and the theories used to explain the data move more into the areas of cognitive psychology and artificial intelligence. As a result, the lower levels of language processing have been more thoroughly investigated and incorporated into IR systems. I am aware of only one system that includes all levels of language analysis." (Liddy, 1998).
Liddy sees three approaches to natural language processing (NLP):
Symbolic Approach (e.g., programming rules for each "level")
Statistical Approach
Connectionist Approach (Use of artificial neural networks)
Liddy does not, however, discuss different theoretical view on the relation between the levels in any detail. She writes that a sequential model has been replaced by a more complex model in which the different levels interact.

(Fig. from Beaugrand, 1984)
When Noam Chomsky began his research were the expectations extremely high regarding the syntactic level. Later on, experiences from research in Machine Translation and Information retrieval (IR) moved the interest to the lexical level (statistical patterns in word occurrences) as the most important level. Lately - as shown in the quotation from Gärdenfors (1999) above, a pragmatic turn has put pragmatic research on the agenda. Behind such preferences for different levels (and the modeling of the levels themselves) are different epistemological and metaphysical assumptions related to, for example, empiricism, rationalism and pragmatism.
Liddy's overview does not relate NLP-technologies to theories of language (such as structuralism or cognitivism) just as it does not consider theories of language in an epistemological perspective (rationalist, empiricist, historicist and pragmatic views on language). In fact Liddy's stage models is based on theoretical assumptions which may be in some conflict with, for example, social semiotics.
Kramsch (2002) outlines some principles of language from a social semiotic point of view (which is contrasted with "the communicative approach":
"The first principle, drawing on the work of V. N. Volosinov, Ragnar Rommetveit, and other linguists working in a phenomenological tradition (see Hanks 142–50 for a review), is that there is no such thing as language without historically situated language users or meaning makers in the local context of their communicative practices. Every word uttered or written is addressed by someone to someone about something and for someone’s benefit at a particular juncture in time. [. . .].
Because of each language user’s unique place in history, each word spoken or written bears the traces of its prior uses and of its uses in lexical collocations or co-occurrences. Thus the second principle of a social semiotic approach to language learning is intertextuality. The communicative approach still teaches dictionary meanings and sentence-based grammar. And yet, as Pierre Bourdieu says, “the all-purpose word in the dictionary [. . .] has no social existence” (39). For instance, we teach that Vater means “father,” but for a German of a certain generation the word Vater might be associated with other words, like Vaterland or Doktorvater. These terms have in turn distinct emotional and historical connotations that are different from their English equivalents, “country” or “dissertation adviser.”
The third principle of a social semiotic view of language is that language learning is a social, dialogic process of meaning construction. Whereas folk notions of language learning see it as an incremental accumulation of atomistic structures that moves the learner from word to sentence, from sentence to paragraph, and from paragraph to text, a social semiotic approach considers language as a holistic network of various signs in the environment, including gestures, silences, body postures, graphic and other visual and acoustic symbols, which shape a context of meaning and invite us to respond to it. The communicative approach has not done justice to contextual variation and change. As the biologist and anthropologist Gregory Bateson says, “contextual shaping is just another term for grammar”".
Literature:
Ammon, U. (1977). Indføring i sociolingvistik. Kbh.: Gyldendal. Chapter 4, pp. 82-101.
•Bach, K. [2000]. The Semantics-Pragmatics Distinction: What It Is and Why It Matters. http://userwww.sfsu.edu/~kbach/semprag.html
Bar-Hillel, Y. (1964). Language and Information. London: Addison-Wesley.
Beaugrande, R. de (1984). Text Production. Toward a Science
of Composition. Greenwich, CT: Ablex Publishing. Available at:
http://beaugrande.bizland.com/text_production.htm (Visited 2006-02-28)
Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier.
Blair, D. C. (2006). Wittgenstein, Language and Information: "Back to the Rough Ground!" Berlin: Springer. (Information Science and Knowledge Management).
Borsley, R. D. (2002). A postmodern critique of linguistics. (A version of this note was presented at the Gregynog Linguistics Colloquium in April 1997). http://privatewww.essex.ac.uk/~rborsley/PMC.htm
Connexor (2003-2004). Machinese Semantics. http://www.connexor.com/software/semantics/
Crystal, D. (2006). Language and the Internet. Cambridge: Cambridge University Press;
Engberg-Pedersen, E. (1979). Nogle sproglige og kognitive problemer i mødet mellem bruger og klassifikationssystem - eksemplificeret ved en interviewundersøgelse på Roskilde Universitetsbibliotek. Specialeopgave i lingvistik. Københavns Universitet.
Gärdenfors, P. (1999). Cognitive science: From computers to anthills as models of human thought. Human IT, 3(2), 9-36. www.hb.se/bhs/ith/2-99/pg.htm
•Halton, E. (1992). Charles Morris. A Brief Outline of His Philosophy with relations to semiotics, pragmatics, and linguistics. http://www.nd.edu/~ehalton/Morrisbio.htm
Heinreichs, J. (1996). Language theory for the computer: monodimensional semantics or multidimensional semiotics? Knowledge Organization, 23(3), 147-156.
Hjørland, B. (2007). Book review of D. C. Blair (2006). Wittgenstein, Language and Information: "Back to the Rough Ground!" Berlin: Springer. Journal of Documentation, 63(2), 281-286.
Hjørland, B. & Nissen Pedersen, K. (2005). A substantive theory of classification for information retrieval. Journal of Documentation, 61(5), 582-597. http://www.db.dk/bh/Core%20Concepts%20in%20LIS/Hjorland%20&%20Nissen.pdf; Click for summary of arguments
Hutchins, W. J. (1975). Languages of indexing and classification. A linguistic study of structures and functions. London: Peter Peregrinus.
Kramsch, C. (2002). Language and culture. A social semiotic perspective. ADFL Bulletin, 33(2), 8-15. [Association of Departments of Foreign Languages]. Available at: http://www.adfl.org/adfl/bulletin/v33n2/332008.htm
Kuhlen, R. (1986). Informationslinguistik. Theoretische, experimentelle, curriculare und prognostische Aspekte einer informationswissenschaftlichen Teildisziplin. Mit beiträgen von Udo Hahn, Rainer Kuhlen & Ulrich Reimer. Tübingen: Max Niemeyer Verlag.
Liddy, E. D. (1998). Enhanced Text Retrieval Using Natural Language Processing. ASIS Bulletin, April/May. Also available on http://www.asis.org/bulletin/apr.98/liddy.html
Liddy, E. D. (2003). Natural Language Processing. IN: Encyclopedia of Library and Information Science. New York: Marcel Dekker.
Lykke Nielsen, M. (2004). Sproglige problemstillinger i informationssøgning. http://www.cst.dk/vid/public/20041202/VID-sem.PDF#search=%22%20association%20%22marianne%20lykke%20nielsen%22%22
Malmkjær, K. (1995a): Behaviourist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 5357). London: Routledge.
Malmkjær, K. (1995b): Functionalist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 158161). London: Routledge.
Malmkjær, K. (1995c). Rationalist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 375379). London: Routledge.
Peregrin, J. (2004). Pragmatism and semantics. (Manuscript in English), published in German in: Fuhrmann, A. & Olsson E. J. (eds.): Pragmatisch denken, Ontos, Frankfurt a M., 2004, 89-108) http://jarda.peregrin.cz/mybibl/PDFTxt/482.pdf
Schneider, J. & Borlund, P. (2004). Introduction to bibliometrics for construction and maintenance of thesauri: methodical considerations. Journal of Documentation, 60(5), 524-549.
Spang-Hanssen, H. (1974). Kunnskapsorganisasjon, informasjonsgjenfinning, automatisering og språk. In: Kunnskapsorganisasjon og informasjonsgjenfinning. Oslo: Riksbibliotektjenesten, pp. 11–61. Core Concepts in LIS/Spang_Hanssen_1974.pdf
Spang-Hanssen, H. (1976). Roles and links compared with grammatical relations in
natural language. Lyngby: DTL.
Spark Jones, K. & Kay, M. (1973). Lingvistics and Information Science. New York &
London: Academic Press. (F.I.D. Publ. no. 492)
Warner, A. J. (1991). Quantitative and qualitative assessments of the impact of
linguistic theory on information science. Journal of the American Society for
Information Science, 42(1), 64-71. (The
author described her findings that there was no significant use of linguistics
literature in the literature of information science. An occasional uncritical
nod to the ‘standard authorities’ was found, but little more).
Warner, J. (2007). Linguistics and information theory:
Analytic advantages. Journal of the American Society for Information Science,
Published Online: 27 Nov 2006.
See also: Languages for special purposes (LSP): http://www.db.dk/jni/lifeboat/info.asp?subjectid=10 (Epistemological lifeboat); Natural language; Semantics; Henning Spang-Hanssen; Syntactical devises; Terminology (Epistemological lifeboat); Translation, machine (MT)
Birger Hjørland
Last edited: 26-03-2008
Questions:
Outline the implications of different understandings of the nature of language for work in library and information science. How are views of language related to approaches to knowledge organization?
to be edited:
Sprogvidenskabelige/lingvistiske problemstillinger er uhyre mangfoldige og brede
i informationsvidenskaben. En væsentlig del af informationsvidenskaben
beskæftiger sig med at formidle tekster, d.v.s. sproglige fremstillinger. De
redskaber, der bruges (f.eks. genfindingssystemer), den kommunikation, der
foregår, er helt overvejende af sproglig art.
Informationsvidenskaben har f.eks. interesseret sig for følgende problemer:
Oversættelse af tekster fra eet sprog til et andet. Oversættelse af
brugerspørgsmål til systemsprog. Brugen af naturligt sprog som
brugergrænseflade til *ir-systemer. Automatisk indexering, kondensering og
emneanalyse af tekster v.h.a. dels syntaktiske dels semantiske analyser o.s.v.
Sproglige aspekter af I&D omfatter foruden decideret lingvistiske teorier også
sprogfilosofiske, -psykologiske og -sociologiske teorier. Underområder, der
også går på tværs af bl.a. lingvistik, filosofi, psykologi og sociologi, og som
i stigende grad interesserer informationsvidenskabelige forskere er semantik
(læren om ords betydning) og semiologi/*semiotik (læren om tegn, herunder f.eks.
forskelle og ligheder mellem tekst og billeder), *leksikografi m.v..
Organisationstrin i sprogprocessen BDI-relevans
Fonologi Redegør for sprogenes lyde og korrekte udtale Tale-interfaces og
Tale-gengivelse. Se også *Lydbog; lyd-dokumentation
Morfologi Redegør for sprogenes ord og deres former. Se også *trunkering;
*maskering
Syntaks Redegør for grammatiske regler i sprogene. Se også *Syntaktiske
anordninger.
Semantik Redegør for ords og sætningers mening og de regler, der kan opstilles
herfor. Se også *Begreb; *Semantik
Pragmatik/diskurs Beskæftiger sig med regler for brugen af meningsfulde
sætninger i givne kommunikationssammenhænge. Se også *"Discourse analysis";
*Fagsprog; *metafor.
Trods ovenstående konstatering er den reelle forbindelse mellem
*informationsvidenskab og sprogvidenskab i praksis ringe, hvilket dokumenteres
af Warner (1991).
Sprogopfattelse i grundlæggende videnskabsteorier
Empirisme og nominalisme Betoner sprogets rolle som etiketter på konkrete ting
eller sanseerfaringer. Almenbegreber eller universalier forstås blot som
klassifikationer af enkeltting, ikke som noget reelt eksisterende.
"Rationalisme" Betoner de universelle strukturer i sprog. Hovednavn: Neon
Chomsky. Opstiller generelle regler, der kan anvendes f.eks. i
maskinoversættelse. Objektivisme. Ser betydning som noget entydigt, som
regler hos sprogbrugere. Mål: Komputere kan forstå naturligt sprog.
Historisme og Hermeneutik Betoner konteksten og det holistiske (f.eks. den
hermeneutiske cirkel). Betoner sproget som talehandlinger, der indeholder
underforståede regler og forpligter den talende. Ser betydning som noget
historisk og socialt udviklet. Hovednavne: Gadamer, Heidegger, den ældre
Wittgenstein (sprogspilsteori). Tenderer noget meget subjektivistisk og
idealistisk. Anvendelsesperspektivet er noget uldent, men er kritisk overfor
kunstig intelligens og betoner istedet for mennesker med horisont både som
brugere og designere af computere og informationssystemer.
Pragmatisme, materialisme, sociolingvistik, fagsprogsforskning
Sociolingvistikken er på kollisionskurs med rationalismen, idet den betoner
sprogenes forskellighed. Den studerer sprogets forhold til konkrete
anvendelser, herunder fagsprogets betydning for kvalifikation og
kommunikation. Hovednavne: M. M. Bakhtin, M. A. K. Halliday and R. Rommetweit.
Anvendelsesmæssigt sigter den ikke som rationalismen imod formalisering og
rationalisering, den er den mindre ulden end hermeneutikken, men sigter imod at
optimere den sproglige (skriftlige) kommunikation i pædagogiske og
arbejdsmæssige sammenhænge. (Herunder fagsproglig normering og
*standardisering). Som *materialisme i bred forstand betones sprogets rolle i
forhold til den samfundsmæssige arbejdsdeling med vægt på
basisproduktionen.