The Shorter Oxford English Dictionary defines documents: "That which serves to show or prove something; evidence, proof. . . Something written, inscribed, etc., which furnishes evidence or information upon any subject, as a manuscript, title deed, coin, etc.".

This broad definition corresponds well with the way the concept "document" is used in Library and Information Science (LIS) as a general term used about texts, electronic files, pictures, and other carriers of information. Buckland (1991) uses the expression: "Information-as-thing".


Vickery & Vickery (1987, p. 36): "a document is a physical medium modified so as to carry marks that are signs in some agreed code. The marks can be images that we recognize or accept as representing some visual aspect of the world; they can be recordings of natural or man-made sound, similarly recognizable; or they can be conventional signs that are accepted as symbols for any mental concept or its referent in the world. The conventional signs can be the letters and words of a natural language, and thus be related to its spoken form, or they can be a special-purpose code (for example, Morse code, Braille, codes for computers, chemical symbolism)"

Buckland (1991, pp. 46-48) describes the history of the concept of document in LIS. Early in the 20th century the documentalists felt a need for a generic term for documenting activities. Not just texts, but also natural objects, artifacts, models, objects reflecting human activities and objects of art and human ideas. The concept of document (or documenting unit) was used in a special meaning including informative physical objects. A wild antelope was not considered a document, but a captured specimen, which was studied, described and incorporated in a Zoo for educational and research purposes was considered a document. If this sounds strange, Buckland brings attention to the fact that the word "document" comes from Latin "docere" meaning to teach or inform and the suffix "-ment" meaning a tool. Originally, then, the word document meant a tool for teaching or informing whether for lecturing, experience or text. The narrowing of the meaning of the word to objects carrying texts became common use at a later time.

Hjerppe (1994) talks about "generalized documents". This concept is based on a concept of generalized texts, again based upon a generalized concept of reading. Such a generalization becomes necessary when many kinds of knowledge and information is integrated in, for example, multimedia and hypertext/hypermedia systems. 

In traditional (positivist) philosophy are things, such as documents, which can be sensed, seen as concrete (while, for example, society is seen as an abstraction). This is not the case in Hegelian philosophy in which a thing is abstract if it cannot be understood in isolation. The technological development may provide new arguments for a Hegelian philosophy: When texts are integrated in, for example, hypertext-systems becomes the single documents difficult to delimitate. It consists of parts, which are combined with other documents (e.g. in Cross-ref-systems). Communication researchers like Nystrand & Wiemelt (1993) point to the need to understand texts (and documents) as phenomena in "discourse communities". Documents cannot be analyzed in isolation. The single document becomes an abstraction in a stream of communication.

Documents should also be seen in relation to the division of labor in society. The concept of "sources" is relevant for this purpose (cf., Kolding Nielsen, 1978). Antelopes are studied by zoologists, not by documentalists. Natural phenomena are (primary) sources of information for natural scientists, records are primary sources for historians, laws are primary sources for legal scholars and lawyers.  Books and publications may be secondary sources for scientists but they are the primary object for library and information specialists.  The point is that the concept of document is an abstraction: There exist different kinds of documentation practices with different kinds of documents. The concept of documents cannot be properly understood without considering those kinds of practices.


In information retrieval has a document been defined as follows:


"A unit of retrieval. It might be a paragraph, a section, a chapter, a Web page, an article, or a whole book." (Baeza-Yates & Ribeiro-Neto, 1999).





Birger Hjørland

Last edited: 05-12-2008