Full text retrieval (& Full text searching)

Full-text retrieval is free text searching in full-text databases, i.e., databases in which the full content of documents (e.g. newspapers or scientific journals) is represented. In full-text databases are not only titles, descriptors, identifiers, abstracts and references searchable (as "subject access points") as is the case in common bibliographical databases; also the whole text of articles or books are searchable. Each word in the text is searchable (with the possible exception of certain stop-words). The search may be sequential or based on inverted files.

 

Common sense as well as experiments shows that by full-text searching is recall dramatically increased (although not to 100%) and that precision tends to decrease. Bauer & Schneider (1990), for example, compared  full-text retrieval with information retrieval based on titles and abstracts, Results  show  considerable  lower  recall  values  for restricted document  size.
 

Kristensen & Jaervelin (1990) describe the problems concerning synonyms, antonyms, quasi-synonyms and homonyms in natural language, which causes problems by free text searching in full-text databases. Thesauri may solve such problems. The authors construe a small thesaurus for searching Finnish newspaper articles on economic subjects. They document an improvement in search efficiency. The average recall increased from 45% to 82%. At the same time there was, however, a small drop in precision.

 

 

 



Literature:

 

Bauer, G. & Schneider, C. (1990). PADOK-II: Untersuchungen zur Volltextproblematik und zur interpretativen Analyse der Retrievalprotokolle. Nachrichten für Dokumentation, 41(1), 21-26.
 

Blair, D. C. & Maron, M. E. (1990). Full-Text Information-Retrieval - Further Analysis and Clarifications. Information Processing & Management, 26(3), 437-447.
 

Consales, J. & Henzel, M. (1989). End user access to full text medical information: an examination of searching behavior (Pp. 101-104 In: National Online Meeting. Proceedings 1989, New York, May 9-11 1989. Edited by Carol Nixon and Lauree Padgett. Medford, New Jersey: Learned Information, Inc.).
 

Greengrass, Alan R.: The New York Times: finding your way through the full text thicket (Pp. 145-148 In: National Online Meeting 1990. Proceedings of the 11th National Online Meeting, New York, 1-3 May 1990. Edited by Martha E. Williams. Medford, New Jersey, Learned Information, Inc., 1990.
 

Kristensen, J. & Jaervelin, K. (1990). The effectiveness of a searching thesaurus in free-text searching in a full-text database. International Classification, 17(2), 77-84.
 

McKinin, E. J. & Sievert, M. E. (1989). A comparison of full-text and abstracts for information retrieval in clinical medicine. (Pp. 295-303 In: National Online Meeting. Proceedings 1989, New York, 9-11 May 1989. Edited by Carol Nixon and Lauree Padgett. Medford, New Jersey: Learned Information, Inc.
 

Sievert, M. E. & McKinin, E. J. (1989). Why full-text misses some relevant documents: an analysis of documents not retrieved by CCML or MEDIS (Pp. 34-39 in: ASIS'89. Managing Information and Technology. Proceedings of the 52nd Annual Meeting of the American Society for Information Science. Edited by Jeffrey Katzer and Gregory B. Newby. Medford, New Jersey: Learned Information, Inc., for American Society for Information Science, Volume 26.
 

Tauchert, W.; Hospodarsky, J.; Krause, J.; Schneider, C. & Womserhacker, C. (1991). Effects of Linguistic Functions on Information-Retrieval in a German-Language Full-Text Database -Comparison Between Retrieval in Abstracts and Full Text. Online Review, 15(2), 77-86.
 

See also: Digital libraryTREC

 

 

 

 

 

Birger Hjørland

Last edited: 01-06-2006

Home

 

to be edited:
 

 

 


Greengrass (1990) beskriver The New York Times fuldtekst database, der indeholder mere end 900,000 articles publiseret siden 1980. Basen er søgbar på Mead Data's NEXIS. Der er mange problemer knyttet til søgning i en stor avisbase, p.gr.a. dens store spredning i emner og formater. Der er gjort mange tiltag for at forbedre brugernes tilgang, herunder forbedret bibliografisk beskrivelse, omfattende indeksering og skabelsen af en speciel biografisk *fil.

Indenfor det medicinske domæne sammenligner McKinin & Sievert (1989) samt Sievert & McKinin (1989) søgninger foretaget i tre filer: MEDLINE, Mead Data Corporation's "MEDIS journal file" og tids­skrift­komponenten i BRS's "Compre­hen­sive Core Medical Library (CCML)". Fuld-tekst søgning resulterede i genfinding af signifikant flere relevante dokumenter end søgning baseret på indekstermer, men viste samtidig et betydelig tab i præcision. En analyse vider, at 204 relevante dokumenter kun blev identificeret ved søgning i den indekserede MEDLINE-base. Der var to grunde til, at disse 204 referencer ikke blev identificeret ved søgning i fuld­tekstbaserne: Dels problemer i forbindelse med formulering af for restriktive søgestrategier, dels problemer foranlediget af uklarhed i *naturligt sprog.

Consales & Henzel (1989) udspørger brugere af medicinske fuld-tekst­baser, belyser almindelige faldgruber og den store forskel, hvormed professionelle *intermediære og slutbrugere griber opgaverne an.