(term frequency–inverse document frequency)
The tf–idf is a
weight often used in information retrieval.
In 1972, Karen Spärck Jones
published in the Journal of Documentation the paper which
defined the term weighting scheme now known as inverse document
exchange in 1972 was part of the stimulus for the development
(via a short paper  in 1974) for the Robertson/Spärck Jones
relevance weighting model of 1976 . However, the circle was
not fully closed until the Croft/Harper paper of 1979  which
showed IDF as an approximation to RSJ relevance weighting,
together with a much later paper  which clarified the
difference between the Croft/Harper approximation and the
original formula. A short technical report 
text retrieval methods developed in this framework, and a
comprehensive paper  covers the combination of IDF weighting
with other weighting factors and reports extensive experimental
results. " (Robertson, 2005)
F. Sebastiani writes:
"One popular class of statistical
term weighting functions is tf * idf (see e.g. Salton &
Buckley, 1988) where two intuitions are at play:
the more frequently tk
occurs in dj, the more important for dj
is it (the term frequency intuition);
the more documents tk
occurs in, the less discriminating is it, i.e. the smaller its
contribution is in characterizing the semantics of a document in which it
occurs (the inverse document frequency intuition).
“ (Sebastiani, 2003).
The idf measure is also known as statistical
Robertson, S. (2005).
Inverse Document Frequency. The Spärck Jones / Robertson IDF page. (Revised
Salton, G. & Buckley, C. (1988). Term-weighting approaches in
automatic text retrieval. Information Processing and Management, 24(5),
Sebastiani, F. (2003). Research in automated text
classification: Trends and perspectives. Manuscript. 4th
International Colloquium on Library and Information Science, Salamanca, 5-7 May
2003. (Invited speech).
Wikipedia. The free encyclopedia. (2006). Tf-idf.
See also: Weighting