Search term

Search terms, search phrases and search expressions are elements of information retrieval languages. The problems related to search terms in document retrieval include:

  1. The user has a problem that he or she cannot adequately conceptualize. In that case is it not possible to identify search terms. The user has first to learn about the problem and about ways to conceptualize the problem, which may involve learning about different "paradigms" in the field. See also Search goal redefinition  Click for discussion

  2. The users initial conceptualization turns out to be problematic. In that case given search terms must be reconsidered or replaced. Again it is important to learn about basic ways to conceptualize the problem.

  3. The user has a concept but he or she cannot express it in search terms. In that case may    some kinds of semantic tools and knowledge organizing systems help identifying search terms. When the user see different definitions and explanations he or she may be able to express the concept in (search) terms. (The danger is that the terms carry certain meanings, which are not in accordance with the user's need). 

  4. The user has one or more search terms but cannot adequately disambiguate different word senses (homograph / polysemy problem) and cannot produce relevant synonyms. In that case may traditional tools such as thesauri be helpful. (The danger is again that the terms carry certain meanings, which are not in accordance with the user's need).

 

In most real-life situations are these different problems interrelated: When new terms are discovered may conceptualizations change. When conceptualizations change, the user may choose to work with a problem that is more or less different, why new search terms are needed and so on. Also, when new ways of conceptualizing problems are developed are new kinds of understanding given terms also changing. It is important to have semantic tools (such as "Begriffsgeschichte") that links different understanding of terms to different conceptualizations. One of the few papers discussing this issue is by Marcia Bates:

 

"One of Gerard Salton's [1968] contributions to research in this area was the idea of iterative feedback to improve output.  He developed a system that would modify the query formulation based on user feedback to the first preliminary output set.   The formulation would be successively improved through the use of feedback on user document preferences until recall and precision were optimized.

       But Salton's iterative feedback is still well within the original classic model as presented in Figure 1-[omitted] because the presumption is that the information need leading to the query is the same, unchanged, throughout, no matter what the user might learn from the documents in the preliminary retrieved set.  In fact, if a user in a Salton experiment were to change the query after seeing some documents, it would be "unfair," a violation of the basic design of the experiment.  The point of the feedback is to improve the representation of a static need, not to provide information that enables a change in the need itself. 

       So throughout the process of information retrieval evaluation under the classic model, the query is treated as a single unitary, one-time conception of the problem.  Though this assumption is useful for simplifying IR system research, real-life searches frequently do not work this way." (Bates, 1989).

 

When people seek information they learn about different conceptualizations and associated terminology as well as other cues, which may be used for redirecting the information search. Such other cues might be, for example, important authors, which may be used in citation searching or important traditions which may be linked with certain geographical areas or certain journals.

 

There exist automated methods for selecting or adding search terms (query expansion). Such techniques cannot, however, cope with the complex relations between conceptualizations and word meaning. They cannot guarantee that relevant documents are identified or that identified documents are relevant. The quality of such techniques depend on the quality of the semantic tools being used and on the nature of the domain being searched.

The problem is related to cognitive and linguistic problems concerning the relation between thinking and language. Does human beings think without language, and the "translate" their thoughts into a language? Or is thinking itself influenced by language?

Ranganathan's differentiating between "idea plane" and "verbal plane" can be seen as representing the first view, while, for example, Vygotsky's and "activity theory's) psychosemiotic view, adapted by, for example, Hjørland (2002) represent the second theory.

"Traditionally, the selection of search terms has been conceptualized and described as a translation process. In the translation model, a search request, provided by a client, is ‘‘translated’’ into search terms. The search terms, ideally, represent search concepts or search topics that can be input to an information retrieval ( IR) system to identify documents, i.e., information-bearing items such as books, articles, video, audiotapes, etc., relevant to the client’s information need (see e.g., International Organization for Standardization, 1985; Lancaster, 1972)." (Iivonen & Sonnenwald, 1998, p. 312).

"The translation process is, in practice, often operationalized as the replacement of one word with another, i.e. a client’s word is replaced one-for-one with a search term. However, a client’s information need or topic may be discussed in information sources, and represented in IR  systems, with a variety of words or phrases. The translation model does not encourage searchers—either professional searchers or end-users—to generate multiple search terms, or consider that a topic may be discussed and represented multiple ways in information sources and IR systems." (Iivonen & Sonnenwald, 1998, p. 312).

Spink & Saracevic (1997) investigating the selection and effectiveness of search terms. They identified and classified sources of search terms as follows:

They evaluated the effectiveness of each source and their contribution to the search results. Question statements and interaction with the user were responsible for 38% and 23% of the selected terms respectively, with thesauri contributed an additional 19% of the terms. A further 11% of the search terms came from term relevance feedback, while professional searchers were responsible for the remaining 9%. In addition to supplying the largest proportion of terms question statements were also the most productive in terms of retrieving relevant items. User-interaction terms were slightly less effective with around 50% resulting in relevant retrieved items. Terms derived from thesauri were less effective again, a fact which caused Spink and Saracevic to conclude that thesaurus terms prove most effective when combined in search statements with user terms.

 

 

 

Literature:

 

Atkins, T. V., & Ostrow, R. (Ed.). (1989). Cross-reference index: A guide to search terms. New York: R. R. Bowker.

 

Bates, M. J. (1989). The Design of Browsing and Berrypicking Techniques for the Online Search Interface. Online Review, (5), 407-424. Available at: http://www.gseis.ucla.edu/faculty/bates/berrypicking.html

Bates, M. J.; Wilde, D. N. & Siegfried, S. (1993). An analysis of search terminology used by humanities scholars - The Getty online searching project. Report 1. Library Quarterly, 63(1), 1-39.

Harter, S. P. (1990). Search term combinations and retrieval overlap - A proposed methodology and case-study. Journal of the American Society for Information Science, 41(2), 132-146.  

 

Hembrooke, H. A.; Granka, L. A.; Gay, G. K. & Liddy, E. D. (2005). The effects of expertise and feedback on search term selection and subsequent learning. Journal of the American Society for Information Science and Technology, 56(8), 861-871.

 

Hjørland, B. (2002). Principia Informatica. Foundational Theory of Information and Principles of Information Services. IN: Emerging Frameworks and Methods. Proceedings of the Fourth International Conference on Conceptions of Library and Information Science (CoLIS4). Ed. By Harry Bruce, Raya Fidel, Peter Ingwersen, and Pertti Vakkari. Greenwood Village, Colorado, USA: Libraries Unlimited. (Pp. 109-121). Click for manus

Iivonen, M. (1995). Consistency in the selection of search concepts and search terms. Information Processing & Management, 31(2), 173-190.  

Iivonen, M. & Sonnenwald, D. H. (1998). From translation to navigation of different discourses: A model of   search term selection during the pre-online stage of the search process. Journal of the American Society for Information Science, 49(4), 312-326.  

Knapp, S. D. (2000). Contemporary Thesaurus of Search Terms and Synonyms: A Guide for Natural Language Computer Searching. Second Edition. Phoenix, AZ: Oryx.

 

Salton, G. (1968). Automatic Information Organization and Retrieval.  New York: McGraw-Hill.

 

Shiri, A. A.; Revie, C. & Chowdhury, G. (2002). Thesaurus-assisted search term selection and query expansion: A review of user-centered studies. Knowledge Organization, 29(1), 1-19. (Correction 29(2), p. 64).

Available at: https://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_323.pdf

 

Smeaton, A. F. (1984). Relevance feedback and a fuzzy set of search terms in an information retrieval system. Information Technology-Research Development Applications, 3(1), 15-23.

 

Sparck Jones, K. (1980). Search term relevance weighting - Some recent results. Journal of Information Science, 1(6), 325-332.

 

Sparck Jones, K.; Ones, K. S. & Tait, J. I. (1984). Automatic search term variant generation. Journal of Documentation, 40(1), 50-66.

 

Spink, A. & Saracevic, T. (1997). Interaction in information retrieval: Selection and effectiveness of search terms. Journal of the American Society for Information Science, 48(8), 741-761.  

 

Vakkari, P.; Pennanen, M. & Serola, S. (2003). Changes of search terms and tactics while writing a research proposal - A longitudinal case study. Information Processing & Management, 39(3), 445-463.

Van Rijsbergen, C. J.; Harper, D. J. & Porter, M. F. (1981). The selection of good search terms. Information Processing & Management, 17(2), 77-91.

 

Wilczynski, N. L.; Walker, C. J.; McKibbon, K. A. & Haynes, R. B. (1994). Quantitative comparison of pre-explosions and subheadings with methodological search terms in MEDLINE. Journal of the American Medical Informatics Association, S, 905-909.

 

Wolfe, R. M. & Sharp, L. K. (2005). Vaccination or immunization? The impact of search terms on the Internet. Journal of Health Communication, 10(6), 537-551.

 

 

See also: Iterative searching;  Language for Special Purposes (Epistemological Lifeboat); Query Expansion; Search languages; Subject access points (Lifeboat for KO)

 

 

 

Birger Hjørland

Last edited: 02-03-2007

Home