Subject (Subject matter)

In the ISO-standard for topic maps is the concept of subject defined this way:

Anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever." ISO 13250-1, here cited from draft:

This definition may work well with the closed system of concepts provided by the topic maps standard. In broader contexts, however, is not fruitful because it does not contain any specification of what to identify in a document or in a discourse when ascribing subject identification terms or symbols to it. If different methods of subject analysis imply different results, which of these results can then be said to reflect the (true) subject? (Given that the expression "a true subject assignment" is meaningful at all, which is an important part of the problem). Different persons may have different opinions about what the subject of a specific document is. How can a theoretical understanding of the term "subject" be helpful deciding principles of subject analysis?


In spite of the importance of the concept of "subject" (as well as its derivations such as subject data, subject representation, subject searching and subject scattering) and in spite of important consequences of different understandings of this term for the theory of LIS have remarkable few discussions of this concept been published. Metcalfe (1973) provides an overview of the history of the concept in libraries for almost hundred years and is one of the few exceptions from the claim that nobody have seriously considered this concept. Metcalfe concludes, however, that the concept is very ambiguous, why he finds it almost useless.


Often seems the subject of a document so obvious, that it is hard to imagine alternatives or to understand that deep theoretical problems should be or could be involved. However, the most important thing to realize is probably that different persons may have good reasons to ascribe different subjects to the same document, that is, that it is illusory so speak of the one true subject of a document disregarding the situation and the purpose of the describing activity. It is thus better to say anything whatsoever may be ascribed a subject by somebody for some purpose (concerning non-documents, see realia). Considered this way are subjects something that is ascribed to documents or to other objects, not something with an independent existence beyond this ascribing activity. But what is it that is being ascribed? What is a subject?


We will start here by providing a short history of the use of the concept of subject since the late part of the 19th century and throughout the 20th century.


Miksa (1983a) - here quoted from Frohmann (1994, p. 112-113) - discusses Charles A. Cutter's concept of subject. For Cutter are the stability of subjects depending of a social process in which their meaning is stabilized in a name or a designation. A subject "referred . . . to those intellections . . . that had received a name that itself represented a distinct consensus in usage" (Miksa, 1993a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61).

Bernd Frohmann adds:

"The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts.
Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The DDC [Dewey Decimal Classification], by contrast, severs those connections. Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic...
    The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112-113)


Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century - and also the understanding reflected in the ISO-standard quoted above. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects.


A system, which has en explicit theoretical foundation is Ranganathan's Colon Classification. As far as known is Ranganathan the only researcher who have earlier given an explicit definition of the concept of "subject":

"Subject - an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person". (Ranganathan, 1967, p. 82). 

Another definition is given by on of Ranganathan's students:

"A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several..." (Gopinath, 1976, p. 51)".


Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It also leads to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...".


It is unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priory, in the definition.


Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as recall and precision. If researchers too much define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR.  Among other things are comparative studies of different kinds of systems made difficult.


Based on these arguments (as well as additional arguments which have been used in the literature) we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon classification. For such purpose is another understanding of "subject" necessary.


In psychology and philosophy is the concept "subject" used in older literature, but it is almost absent in recent literature (in the sense considered here). "Subject" has, for example, been used in the sense of "intentional object" in phenomenology. In Denmark has  Tranekjær-Rasmussen (1956) proposed a "emnelære" (theory of subjects), but this has never been translated or discussed in an international context (cf., Hjørland, 1993, 202-206). The concept is also used in linguistics. Togeby (1977, pp. 28-30) finds that by any transmission of information is something presumed, while something else is new. The concept "subject" is related to what is already known. A subject (as noun) can thus only be expressed using familiar nouns or general terms, not indefinite specific terms. Some information is turned to subjects, other information is placed in focus.  The subject of a sentence is not identical with the content of that sentence. The sentence: "The Danish flag is white and red" has as subjects, for example, the Dannebrog, colors and  the colors of Dannebrog. Its content is, however, that Dannebrog's colors are red and white. The subject is thus a categorical determination of a content (cf., category). Johansen (1994) describes how subjects are organized in discourses.


Psychology, philosophy and linguistics have not, however, had the same concrete and pressing need as has Library and Information Science (LIS) to specify the subject analysis of documents and thus the meaning of the term "subject".


The most serious analysis of the concept of subject in LIS is probably given by Patrick Wilson. Wilson (1968) examines - in particular by thought experiments - the suitability of different methods of examining the subject of a document. Among the methods are

  1. To identify the author's purpose for writing the document

  2. To at weight the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader. 

  3. To group or count the documents use of concepts and references

  4. To construe a set of rules for selecting the elements which are necessary as opposed to unnecessary for the work as a whole. 

Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude (p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous  ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way.  Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness".

Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term in a document is used in one or another meaning. Clear and relevant concepts and distinctions in classification systems and controlled vocabularies may be fruitful even if they are applied to documents with ambiguous terminology.


A number of researchers in LIS have tried to escape the difficulties in the concept of subject by preferring to use the concept "aboutness" as an alternative. A justification for this decision is given by Hutchins (1975, p. 115):

" 7.7 From this account of indexing one thing should now be clear, namely, that the notion of the "subject" of a document is peculiarly vague. We may mean the "extensional aboutness" or the "intensional aboutness", as given by the author in his title or as given by the abstractor or by the indexer; we may mean the NL [natural language; BH] phrase expressing the Topic or we may mean the DL [documentary language; BH] expression denoting the document content. There are clearly so many variables involved that whenever we talk of he "subject" of a document we ought always to say what kind of subject we are intending.
    These questions of definiation are, of course, a quite separate issue from the point stressed at the beginning of this chapter, namely that we should never talk of the subject of a document. As we have seen, judgments of subject content (by authors, readers and indexers) are influenced by so many factors that any particular statement of a document's content should never be regarded as anything other than just one of many possible such statements. In other contexts and from other perspectives the same document may have other, quite different "subjects"".

The concept of aboutness is thus introduced in order to solve the problem in the concept of "subject". If a satisfactory definition of "subject" is possible (as claimed by Hjørland, 1992) then the argument for introducing "aboutness" becomes invalid. In addition it appears that the introduction of "aboutness" in the literature was made on problematic mentalist and subjectivist premises.

Hjørland (1992, 1997) found that any practice of subject determination as well as any theory of subject analysis is necessarily based of epistemological views. Those views are, however, seldom explicit, and often unknown because of lack of epistemological knowledge in LIS. Each approach to subject analysis and information retrieval is more or less based in specific epistemological assumptions. Facet analysis, IR-approaches, user-oriented approaches, bibliometric approaches etc. are basically related to different epistemological views which implies different conceptions of what subjects are. Based on this analysis, Hjørland (1992) developed a new understanding of subjects as "informative potentials" (first formulated as "epistemological potentials").


The subjects of a document are its informative potentials


The basic idea is simple to explain. Rather than seeking the subject of a document, for example, in some inherent, "objective" facts about that document, the indexer should ask: "What is this document useful for"? In other words, the subject assignment is seen as a human act which aim at supporting some activities of the users (compare: request oriented indexing). The subject determination that is most successful in accomplishing this goal is the most correct one. Consequently are subject determinations situational and context-dependent. The subject of a document is also theory-dependent. Just as one could not describe the potentials of uranium as an energy source before the development of physical theories of radioactivity, the potentials of documents are changing when theories change. This is best understood by considering the citation patterns and reception history of documents. Although uranium could not be described as an energy source before the development of theories on radioactivity, uranium nonetheless contained the potentials all the time. The same is the case with documents. Their potentials may be unrecognized for a long time, but nevertheless they exist.


Hjørland's concept of subject has (by Peter Bøgh Andersen, informal communication) been compared with C. S. Peirce's concept "the final interpretant". This later concept is explained by Joseph Ransdell:


"The application of the distinction between the immediate, dynamical, and final interpretant can be illustrated by the role of precedent cases in law (that is, in legal semiosis), and the reader can perhaps extrapolate by analogy from this to its applications in other fields. There are occasions when the dynamical interpretant--that is, the actually occurring interpretant--of a sign which is the law is not definitely identifiable because the law is too vague in the relevant respect: the facts of the case may be clear enough but the meaning of the law is not, and the judge must, as we say, exercise real judgment in the matter (which is to say that the judge must recognize something as being the relevant dynamical interpretant without benefit of recourse to any ascertainable basis in the immediate interpretant that would justify that recognition). The conscientious judge makes a guess, in effect, at what the final interpretant includes when he or she recognizes something as being a dynamical interpretant of that law at that time relative to that case. But it is the course of future legal interpretation of that law (in courts of appeal, in future juridical practice, and so on) which will determine whether the judge was or was not right in his or her attempt to anticipate the relevant content of the final interpretant--or, as we would ordinarily say, in the attempt to set a precedent that will be honored.  (Ransdell, 1994, pp. 681-683, here quoted from former version available on Internet).

Hjørland's understanding of the concept "subject" is thus related to the semiotic theory of pragmatic philosophers. It has, however, been developed with inspiration in particular from activity theory, which is more related to social semiotics compared to ordinary semiotics. The concept of meaning production in cultures and societies is an important background for a proper understanding of what subjects are. This way we are back to Cutter's view of concept as something shaped by social processes.



Stam (2000) is critical about subjects as basis for groupings. However, one may say that he may be concerned with that aspect of subject matter which is usually called topic or topicality.



Subject matter is the weakest criterion for generic groupings because it fails to take into account how the subject is treated. (Stam, 2000, p. 14)


There is nothing to prevent the use of, for example, terms related to method and genre as a part of the subject description of a document.





Birger Hjørland

Last edited: 04-03-2007