Full text retrieval (& Full text searching)
Full-text retrieval is free text searching in full-text databases, i.e., databases in which the full content of documents (e.g. newspapers or scientific journals) is represented. In full-text databases are not only titles, descriptors, identifiers, abstracts and references searchable (as "subject access points") as is the case in common bibliographical databases; also the whole text of articles or books are searchable. Each word in the text is searchable (with the possible exception of certain stop-words). The search may be sequential or based on inverted files.
Common sense as well as experiments shows that by full-text
searching is recall dramatically increased (although
not to 100%) and that precision tends to decrease.
Bauer & Schneider (1990), for example, compared full-text retrieval
with information retrieval based on titles and abstracts, Results show
considerable lower recall values for restricted document
size.
Kristensen & Jaervelin (1990) describe the problems concerning synonyms, antonyms, quasi-synonyms and homonyms in natural language, which causes problems by free text searching in full-text databases. Thesauri may solve such problems. The authors construe a small thesaurus for searching Finnish newspaper articles on economic subjects. They document an improvement in search efficiency. The average recall increased from 45% to 82%. At the same time there was, however, a small drop in precision.
Literature:
Bauer, G. & Schneider, C. (1990). PADOK-II: Untersuchungen zur
Volltextproblematik und zur interpretativen Analyse der Retrievalprotokolle.
Nachrichten für Dokumentation, 41(1), 21-26.
Blair, D. C. & Maron, M. E. (1990). Full-Text Information-Retrieval - Further Analysis and
Clarifications. Information Processing & Management, 26(3), 437-447.
Consales, J. & Henzel, M. (1989). End user access to full text medical
information: an examination of searching behavior (Pp. 101-104 In: National
Online Meeting. Proceedings 1989, New York, May 9-11 1989. Edited by Carol Nixon
and Lauree Padgett. Medford, New Jersey: Learned Information, Inc.).
Greengrass, Alan R.: The New York Times: finding your way through the full text
thicket (Pp. 145-148 In: National Online Meeting 1990. Proceedings of the 11th
National Online Meeting, New York, 1-3 May 1990. Edited by Martha E. Williams.
Medford, New Jersey, Learned Information, Inc., 1990.
Kristensen, J. & Jaervelin, K. (1990). The effectiveness of a searching
thesaurus in free-text searching in a full-text database. International Classification, 17(2), 77-84.
McKinin, E. J. & Sievert, M. E. (1989). A comparison of full-text and abstracts
for information retrieval in clinical medicine. (Pp. 295-303 In: National Online
Meeting. Proceedings 1989, New York, 9-11 May 1989. Edited by Carol Nixon and
Lauree Padgett. Medford, New Jersey: Learned Information, Inc.
Sievert, M. E. & McKinin, E. J. (1989). Why full-text misses some relevant
documents: an analysis of documents not retrieved by CCML or MEDIS (Pp. 34-39
in: ASIS'89. Managing Information and Technology. Proceedings of the 52nd Annual
Meeting of the American Society for Information Science. Edited by Jeffrey
Katzer and Gregory B. Newby. Medford, New Jersey: Learned Information, Inc., for
American Society for Information Science, Volume 26.
Tauchert, W.; Hospodarsky, J.; Krause, J.; Schneider, C. & Womserhacker,
C. (1991). Effects
of Linguistic Functions on Information-Retrieval in a German-Language Full-Text
Database -Comparison Between Retrieval in Abstracts and Full Text. Online Review, 15(2), 77-86.
See also: Digital library; TREC
Birger Hjørland
Last edited: 01-06-2006
to be edited:
Greengrass (1990) beskriver The New York Times fuldtekst database, der
indeholder mere end 900,000 articles publiseret siden 1980. Basen er søgbar på
Mead Data's NEXIS. Der er mange problemer knyttet til søgning i en stor
avisbase, p.gr.a. dens store spredning i emner og formater. Der er gjort mange
tiltag for at forbedre brugernes tilgang, herunder forbedret bibliografisk
beskrivelse, omfattende indeksering og skabelsen af en speciel biografisk *fil.
Indenfor det medicinske domæne sammenligner McKinin & Sievert (1989) samt
Sievert & McKinin (1989) søgninger foretaget i tre filer: MEDLINE, Mead Data
Corporation's "MEDIS journal file" og tidsskriftkomponenten i BRS's
"Comprehensive Core Medical Library (CCML)". Fuld-tekst søgning resulterede i
genfinding af signifikant flere relevante dokumenter end søgning baseret på
indekstermer, men viste samtidig et betydelig tab i præcision. En analyse vider,
at 204 relevante dokumenter kun blev identificeret ved søgning i den indekserede
MEDLINE-base. Der var to grunde til, at disse 204 referencer ikke blev
identificeret ved søgning i fuldtekstbaserne: Dels problemer i forbindelse med
formulering af for restriktive søgestrategier, dels problemer foranlediget af
uklarhed i *naturligt sprog.
Consales & Henzel (1989) udspørger brugere af medicinske fuld-tekstbaser,
belyser almindelige faldgruber og den store forskel, hvormed professionelle
*intermediære og slutbrugere griber opgaverne an.