Linguistic aspects of Library and Information Science (LIS)

Library and information work is pervaded by language in many ways, although the formal contact between LIS and linguistics seems to be weak (cf., Warner, 1991).
 

Any theory of LIS or approach within LIS must in some way incorporate theoretical views involving language. Just as different epistemologies have influenced LIS have the same epistemologies also influenced linguistics. For example may the facet-analytic approach within LIS be understood as a influence of a rationalist epistemology. The same rationalist epistemology has also influenced linguistics (cf., Malmkjær, 1995c) and semantic primitives may be understood as a linguist theory that in the basic assumptions correspond very much to the facet-analytic approach in LIS.

 

For any theoretical approach to LIS must corresponding theoretical approaches to language exist (and vice versa). This is also the case with psycho-sociological theories of human beings ("users"): different views of human beings have corresponding theories about language. When cognitive science developed from about 1956, the linguist Noam Chomsky was one of the leading figures. His view was also rationalist and he and cognitive science influenced both linguistics, psychology, LIS and many other fields (cf., cognitive science). The information scientist Gerald Salton expressed pessimism concerning the usefulness of linguistics in information science. In the words of the Danish linguist and information scientist Henning Spang-Hanssen:

 

"In this connection it is important to realize that the points of view, which have been domination within linguistics in the last 10-15 years, in particular in the USA (i.e. Noam Chomsky's school of generative grammar) not has had practical influence worth mentioning in relation to natural language processing." In its theoretical foundation and in the technicalities (such as the writing of rules in algorithmic form) exist important similarities between generative grammar and electronic data processing. Natural language processing seems, however, in practice still to depend on traditional categories of grammar and traditionally formed dictionaries. This demonstrates in my opinion the the problems related to automation of text - as opposed to problems related to automation of mathematical computations, are fundamental and thus cannot be eliminated just by computer-oriented versions of linguistics.

    I thus share with Gerald Salton his pessimism about the usefulness of recent linguistics in relation to automated documentation. However, Salton seems to identify linguistics with modern American linguistics and thus to miss the knowledge, which was gained before generative grammar evolved or which was gained in other countries such as Scandinavia".(Spang-Hanssen, 1974, 17, translated by BH).

 

"The so-called pragmatic approach in modern linguistics tries to expand the understanding of linguistic activity' to include all issues, which may influence the use of language (e.g. what determines that we chooses to speak rather than to be silent, the choice of people with whom we communicate, attitudes towards the listener, imaginations of the listeners knowledge and wants). I believe it is too early to evaluate the potentials of this approach for describing issues like subject indexing as linguistic activities. "  (Spang-Hanssen, 1974, 47, translated by BH).

 

In order to understand the relation between linguistics and LIS it is thus important to understand that both fields are influenced by changing epistemological views and interdisciplinary trends. Epistemology is simply a deeper way to understand both fields. Different approaches to the understanding of language include:

 

The situation is thus rather complex, and a conclusion seems to be that information scientists  need to form broader theories that includes not just language, but also knowledge, cognition and social organization.

 

Different approaches and epistemologies may be grouped to broad "families" sharing related views. A basic distinction may be made between views related to logical positivism on the one hand, and on the other hand views related to pragmatism:

 

Peregrin (2004) suggests that the two main paradigms in semantics are the one developed by logical positivists such as Rudolph Carnap (and the young Wittgenstein) on the one hand and the one developed by pragmatic philosophers such as John Dewey (and related to, among others, the late Wittgenstein) on the other hand. The positivist semantics suggests that expressions 'stand for' entities and their meanings are the entities stood for by them. The pragmatic semantics suggests that expressions are tools for interaction and their meanings are their functions within the interaction, their aptitudes to serve it in their distinctive ways. Another difference is that positivism tend to look at language as a neutral medium, that does not influence what speakers are communicating, how messages is transmitted or how messages are understood. The alternative view sees linguistic communities as sharing forms of preunderstanding and thus the processing of language. This dichotomy may be used in the general field of linguistics as well as in other human sciences. It was also used by Hjørland & Nissen Pedersen (2005) about the foundation of a theory of classification for information retrieval (click for summary of arguments).

 

In an article in Human IT Peter Gärdenfors (1999) noted that

Such an alternative view on cognitive processes and language emphasizing culture and action a long time existed before cognitive science was born as a movement. Persons such as John Dewey and S. L. Vygotsky were early spokesmen for such a view in psychology and in linguistics similar views were expressed by, for example, M. M. Bakhtin, M. A. K. Halliday and R. Rommetweit. All these researchers can be seen as an early alternative conception compared to the basic ideas in cognitive science and rationalist approaches to linguistics.

 

Blair claims that his book on Wittgenstein and information (2006)  "by using this theory it creates a firm foundation for future Information Retrieval research.". This claim is, however, problematized by Hjørland's review (2007). Wittgenstein may be important, but Blair fails to provide convincing arguments for a foundation for information studies.

 

Liddy (2003) discusses the following "levels" of natural language processing (NLP):

 

 

                                                             Pragmatic
                                                   Discourse
                                        Semantic
                              Syntactic
                    Lexical
          Morphological
Phonetic

    Liddys model (2003) of Natural Language Processing

 

 

 

Phonetics

The interpretation of speech sounds within and across words. Phonetic knowledge is used, for example, for building speech recognizing systems.

 

Morphology

Deals with the componential nature of words, which are composed of morphemes (the smallest units of
meaning). Morphological knowledge is used, for example, for automatic stemming, truncation or masking of words.

 

Lexicology

Is the study of words. (Related to lexicography, the study of dictionaries). The (mental) lexicon organizes the mental vocabulary in a speaker's mind. Lexicology is normally seen as a part of semantics, but is especially in French and German linguistics seen as an independent subdiscipline. 
 

Syntax (or grammar)

syntax is the study of the rules, or "patterned relations", that govern the way the words in a sentence are arranged. Syntactic rules are used in parsing algorithms. The software Connexor was used, for example, by Schneider & Borlund (2004) to identify noun phrases related to bibliographical references. http://www.connexor.com/demo/syntax/

 

Semantics

Is the study of meaning. The study of the meaning of isolated words may be termed lexical semantics. The study of meaning is also related to, for example, syntax at the level of the sentence and to discourse at the level of texts.

 

Discourse analysis

"Although syntax and semantics work with sentence-length units, the discourse level of NLP works with units of text longer than a sentence." (Liddy, 2003, p. 2130). Examples from information science are the resolving of anaphora and ellipsis and the examination of the effect on proximity searching.  (Discourse analysis is a term which is also used in many other meanings).

 

Pragmatics (Pragmatics should not be confused with pragmatism in philosophy)

Pragmatics is often understood as the study of how the context (or "world knowledge") influences meaning. Liddy (2003, p. 2130) provides the following example:

The word pen can have at least two meanings (a container for animals or children, and a writing implement). In the sentence The box was in the pen one knows that only the first meaning is plausible; the second meaning is excluded by one's knowled­ge of the normal sizes of (writing) pens and boxes. This example demonstrates that it is easy for people to choose the right sense of the word, but it is extremely difficult to program a computer with all the world knowledge necessary to do the same.

 

"The above levels of linguistic processing reflect an increasing size of unit of analysis as well as increasing complexity and difficulty as we move from top to bottom. The larger the unit of analysis becomes (i.e., from morpheme to word to sentence to paragraph to full document), the less precise the language phenomena and the greater the free choice and variability. This decrease in precision results in fewer discernible rules and more reliance on less predictable regularities as one moves from the lowest to the highest levels. Additionally, higher levels presume reliance on the lower levels of language understanding, and the theories used to explain the data move more into the areas of cognitive psychology and artificial intelligence. As a result, the lower levels of language processing have been more thoroughly investigated and incorporated into IR systems. I am aware of only one system that includes all levels of language analysis." (Liddy, 1998).

 

Liddy sees three approaches to natural language processing (NLP):

Liddy does not, however, discuss different theoretical view on the relation between the levels in any detail. She writes that a sequential model has been replaced by a more complex model in which the different levels interact.

 

 

 

(Fig. from Beaugrand, 1984)

 

When Noam Chomsky began his research were the expectations extremely high regarding the syntactic level. Later on, experiences from research in Machine Translation and Information retrieval (IR) moved the interest to the lexical level (statistical patterns in word occurrences) as the most important level. Lately - as shown in the quotation from Gärdenfors (1999) above, a pragmatic turn has put pragmatic research on the agenda. Behind such preferences for different levels (and the modeling of the levels themselves) are different epistemological and metaphysical assumptions related to, for example, empiricism, rationalism and pragmatism.

 

Liddy's overview does not relate NLP-technologies to theories of language (such as structuralism or cognitivism) just as it does not consider theories of language in an epistemological perspective (rationalist, empiricist, historicist and pragmatic views on language). In fact Liddy's stage models is based on theoretical assumptions which may be in some conflict with, for example, social semiotics. 

 

Kramsch (2002) outlines some principles of language from a social semiotic point of view (which is contrasted with "the communicative approach":

 

"The first principle, drawing on the work of V. N. Volosinov, Ragnar Rommetveit, and other linguists working in a phenomenological tradition (see Hanks 142–50 for a review), is that there is no such thing as language without historically situated language users or meaning makers in the local context of their communicative practices. Every word uttered or written is addressed by someone to someone about something and for someone’s benefit at a particular juncture in time. [. . .].

Because of each language user’s unique place in history, each word spoken or written bears the traces of its prior uses and of its uses in lexical collocations or co-occurrences. Thus the second principle of a social semiotic approach to language learning is intertextuality. The communicative approach still teaches dictionary meanings and sentence-based grammar. And yet, as Pierre Bourdieu says, “the all-purpose word in the dictionary [. . .] has no social existence” (39). For instance, we teach that Vater means “father,” but for a German of a certain generation the word Vater might be associated with other words, like Vaterland or Doktorvater. These terms have in turn distinct emotional and historical connotations that are different from their English equivalents, “country” or “dissertation adviser.”

The third principle of a social semiotic view of language is that language learning is a social, dialogic process of meaning construction. Whereas folk notions of language learning see it as an incremental accumulation of atomistic structures that moves the learner from word to sentence, from sentence to paragraph, and from paragraph to text, a social semiotic approach considers language as a holistic network of various signs in the environment, including gestures, silences, body postures, graphic and other visual and acoustic symbols, which shape a context of meaning and invite us to respond to it. The communicative approach has not done justice to contextual variation and change. As the biologist and anthropologist Gregory Bateson says, “contextual shaping is just another term for grammar”".

 


Literature:

 

Ammon, U. (1977). Indføring i sociolingvistik. Kbh.: Gyldendal. Chapter 4, pp. 82-101.

 

Bach, K. [2000]. The Semantics-Pragmatics Distinction: What It Is and Why It Matters. http://userwww.sfsu.edu/~kbach/semprag.html

 

Bar-Hillel, Y. (1964). Language and Information. London: Addison-Wesley.

 

Beaugrande, R. de (1984). Text Production. Toward a Science of Composition. Greenwich, CT: Ablex Publishing. Available at: http://beaugrande.bizland.com/text_production.htm (Visited 2006-02-28) 
 

Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier.

 

Blair, D. C. (2006). Wittgenstein, Language and Information: "Back to the Rough Ground!" Berlin: Springer. (Information Science and Knowledge Management).

 

Borsley, R. D. (2002). A postmodern critique of linguistics. (A version of this note was presented at the Gregynog Linguistics Colloquium in April 1997). http://privatewww.essex.ac.uk/~rborsley/PMC.htm

Connexor (2003-2004). Machinese Semantics. http://www.connexor.com/software/semantics/

Crystal, D. (2006). Language and the Internet. Cambridge: Cambridge University Press;

 

Engberg-Pedersen, E. (1979). Nogle sproglige og kognitive problemer i mødet mellem bruger og klassifikationssystem - eksemplificeret ved en interviewundersøgelse på Roskilde Universitetsbibliotek. Specialeopgave i lingvistik. Københavns Universitet.

 

Gärdenfors, P. (1999). Cognitive science: From computers to anthills as models of human thought. Human IT, 3(2), 9-36. www.hb.se/bhs/ith/2-99/pg.htm

 

Halton, E. (1992). Charles Morris. A Brief Outline of His Philosophy with relations to semiotics, pragmatics, and linguistics.   http://www.nd.edu/~ehalton/Morrisbio.htm

 

Heinreichs, J. (1996). Language theory for the computer: monodimensional semantics or multidimensional semiotics? Knowledge Organization, 23(3), 147-156.

 

Hjørland, B. (2007). Book review of D. C. Blair (2006). Wittgenstein, Language and Information: "Back to the Rough Ground!" Berlin: Springer. Journal of Documentation, 63(2), 281-286.

 

Hjørland, B. & Nissen Pedersen, K. (2005). A substantive theory of classification for information retrieval. Journal of Documentation, 61(5), 582-597. http://www.db.dk/bh/Core%20Concepts%20in%20LIS/Hjorland%20&%20Nissen.pdf Click for summary of arguments

 

Hutchins, W. J. (1975). Languages of indexing and classification. A linguistic study of structures and functions. London: Peter Peregrinus.

 

Kramsch, C. (2002). Language and culture. A social semiotic perspective. ADFL Bulletin, 33(2), 8-15. [Association of Departments of Foreign Languages]. Available at:  http://www.adfl.org/adfl/bulletin/v33n2/332008.htm

 

Kuhlen, R. (1986). Informationslinguistik. Theoretische, experimentelle, curriculare und prognostische Aspekte einer informationswissenschaftlichen Teildisziplin. Mit beiträgen von Udo Hahn, Rainer Kuhlen & Ulrich Reimer. Tübingen: Max Niemeyer Verlag.

 

Liddy, E. D. (1998). Enhanced Text Retrieval Using Natural Language Processing. ASIS Bulletin, April/­May. Also available on http://www.asis.org/bulle­tin/apr.98/­liddy.html

 

Liddy, E. D. (2003). Natural Language Processing. IN: Encyclopedia of Library and Information Science. New York: Marcel Dekker.

 

Lykke Nielsen, M. (2004). Sproglige problemstillinger i informationssøgning.   http://www.cst.dk/vid/public/20041202/VID-sem.PDF#search=%22%20association%20%22marianne%20lykke%20nielsen%22%22

 

Malmkjær, K. (1995a): Behaviourist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 53­57). London: Routledge.

 

Malmkjær, K. (1995b): Functionalist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 158­161). London: Routledge.

 

Malmkjær, K. (1995c).  Rationalist linguistics. In K. Malmkjær (Ed.), The Linguistics Encyclopedia (pp. 375­379). London: Routledge.

 

Peregrin, J. (2004). Pragmatism and semantics. (Manuscript in English), published in  German in: Fuhrmann, A. & Olsson E. J. (eds.): Pragmatisch denken, Ontos, Frankfurt a M., 2004, 89-108) http://jarda.peregrin.cz/mybibl/PDFTxt/482.pdf

 

Schneider, J. & Borlund, P. (2004). Introduction to bibliometrics for construction and maintenance of thesauri: methodical considerations. Journal of Documentation, 60(5), 524-549.

 

Spang-Hanssen, H. (1974). Kunnskapsorganisasjon, informasjonsgjenfinning, automatisering og språk. In: Kunnskapsorganisasjon og informasjonsgjenfinning. Oslo: Riksbibliotektjenesten, pp. 11–61. Core Concepts in LIS/Spang_Hanssen_1974.pdf

 

Spang-Hanssen, H. (1976). Roles and links compared with grammatical relations in natural language. Lyngby: DTL.
 

Spark Jones, K. & Kay, M. (1973). Lingvistics and Information Science. New York & London: Academic Press. (F.I.D. Publ. no. 492)

Warner, A. J. (1991). Quantitative and qualitative assessments of the impact of linguistic theory on information science. Journal of the American Society for Information Science, 42(1), 64-71. (
The author described her findings that there was no significant use of linguistics literature in the literature of information science. An occasional uncritical nod to the ‘standard authorities’ was found, but little more).

 

Warner, J. (2007). Linguistics and information theory: Analytic advantages. Journal of the American Society for Information Science, Published Online: 27 Nov 2006.  
 

 

 

See also: Languages for special purposes (LSP): http://www.db.dk/jni/lifeboat/info.asp?subjectid=10 (Epistemological lifeboat); Natural language; Semantics; Henning Spang-Hanssen; Syntactical devises; Terminology (Epistemological lifeboat); Translation, machine (MT)

 

 

 

 

 

 

Birger Hjørland

Last edited: 26-03-2008

Home

 

Questions:

 

Outline the implications of different understandings of the nature of language for work in library and information science. How are views of language related to approaches to knowledge organization?

 

 

 

 

 

 

to be edited:

Sprogvidenskabelige/lingvistiske problemstillinger er uhyre mangfoldige og brede i informationsvidenskaben. En væsentlig del af informations­viden­skaben beskæftiger sig med at formidle tekster, d.v.s. sproglige frem­stillinger. De redskaber, der bruges (f.eks. genfindingssystemer), den kommunikation, der foregår, er helt overvejende af sproglig art.

Informationsvidenskaben har f.eks. interesseret sig for følgende problemer: Oversættelse af tekster fra eet sprog til et andet. Oversættelse af brugerspørgsmål til systemsprog. Brugen af naturligt sprog som bruger­grænseflade til *ir-systemer. Automatisk indexering, kondensering og emne­analyse af tekster v.h.a. dels syntaktiske dels semantiske analyser o.s.v.

Sproglige aspekter af I&D omfatter foruden decideret lingvistiske teorier også sprog­filosofiske, -psykologiske og -sociologiske teorier. Under­områder, der også går på tværs af bl.a. lingvistik, filosofi, psykologi og sociologi, og som i stigende grad interesserer informationsvidenskabelige forskere er semantik (læren om ords betydning) og semiologi/*semiotik (læren om tegn, herunder f.eks. forskelle og ligheder mellem tekst og billeder), *leksikografi m.v..

Organisationstrin i sprogprocessen BDI-relevans
Fonologi Redegør for sprogenes lyde og korrekte udtale Tale-interfaces og Tale-gengivelse. Se også *Lydbog; lyd-dokumentation
Morfologi Redegør for sprogenes ord og deres former. Se også *trunkering; *maskering
Syntaks Redegør for grammatiske regler i sprogene. Se også *Syntaktiske anordninger.
Semantik Redegør for ords og sætningers mening og de regler, der kan opstilles herfor. Se også *Begreb; *Semantik
Pragmatik/diskurs Beskæftiger sig med regler for brugen af meningsfulde sætninger i givne kommuni­kations­sammen­hænge. Se også *"Discourse analysis"; *Fagsprog; *metafor.

Trods ovenstående konstatering er den reelle forbindelse mellem *informations­­videnskab og sprogvidenskab i praksis ringe, hvilket dokumenteres af Warner (1991).


Sprogopfattelse i grundlæggende videnskabsteorier
Empirisme og nominalisme Betoner sprogets rolle som etiketter på konkrete ting eller sanse­erfaringer. Almenbegreber eller universalier forstås blot som klassifikationer af enkeltting, ikke som noget reelt eksisterende.
"Rationalisme" Betoner de universelle strukturer i sprog. Hoved­navn: Neon Chomsky. Opstiller generelle regler, der kan anvendes f.eks. i maskin­over­­sættelse. Objektivisme. Ser betydning som noget entydigt, som regler hos sprog­brugere. Mål: Komputere kan forstå naturligt sprog.
Historisme og Hermeneutik Betoner konteksten og det holistiske (f.eks. den hermeneutiske cirkel). Betoner sproget som talehandlinger, der indeholder underforståede regler og forpligter den talende. Ser betydning som noget historisk og socialt udviklet. Hovednavne: Gadamer, Heidegger, den ældre Wittgenstein (sprogspilsteori). Tenderer noget meget subjektivistisk og idealistisk. Anven­del­ses­­perspektivet er noget uldent, men er kritisk overfor kunstig intelligens og betoner istedet for mennesker med horisont både som brugere og designere af computere og informations­systemer.
Pragmatisme, materialisme, sociolingvistik, fagsprogs­forsk­ning Sociolingvistikken er på kollisionskurs med ratio­nalismen, idet den betoner sprogenes forskel­lighed. Den studerer sprogets forhold til konkrete anvendelser, herunder fag­spro­gets betydning for kvalifikation og kom­mu­ni­ka­tion. Hovednavne: M. M. Bakhtin, M. A. K. Halliday and R. Rommetweit. Anvendelses­mæs­sigt sigter den ikke som rationalismen imod for­ma­li­sering og rationalisering, den er den mindre ulden end hermeneutikken, men sigter imod at optimere den sproglige (skriftlige) kom­munikation i pædagogiske og arbejds­mæs­sige sammenhænge. (Herunder fagsproglig normering og *standardisering). Som *materialisme i bred forstand betones sprogets rolle i forhold til den sam­funds­mæs­sige arbejdsdeling med vægt på basis­pro­duk­tionen.