Parser (and parsing)
A concepts from computational linguistics. A parser is a program which utilizes
a grammar to analyze a text in its syntactical components. The text may be
natural language or programming language.
Human sentences are not easily parsed by programs, as there is substantial
ambiguity in the structure of human language.
Harter (1986, p. 87ff.) writes:
"When a database is loaded onto a search
system for the first time or is being updated, an algorithm called a parsing
rule is used to prepare the inverted index. A parsing rule is specific to a
particular field of a given database. It refers to a set of separating and
sorting operations performed by the search service on that data field. The
parsing rule is applied when the inverted index is created from the linear file
for that database.
The simplest parsing rule is just to make an entry in the inverted index for
every word in the field. There are at least two reasons why some modification of
this rule for some fields may be useful. First, there are many common, function
words in natural language that occur frequently and that would not be useful as
search terms (for example, "and"). Such terms are often eliminated from the
inverted index.
Second, there may be reasons for wanting to preserve phrases in certain fields,
so that *false drops in these fields can be minimized at the time of search. The
descriptor field is an obvious example of this. Clearly, little is accomplished
by indexing documents with phrases such as "chemical bonding" and then destroy
this *pre-coordinated phrase by parsing the descriptor field on a word by word
basis...."
Literature:
Fischler, M. A. & Firschein, O. (1987). Intelligence: The Eye, the Brain, and the
Computer. Reading, Massachusetts: Addison-Wesley, pp. 175-186.
Harter, Stephen P. (1986). Online Information Retrieval. Concepts, Principles, and
Techniques. London: Academic Press.
Wikipedia. The free encyclopedia. (2006). Parsing. http://en.wikipedia.org/wiki/Parsing
See also: Natural
Language Processing
Birger Hjørland
Last edited: 31-05-2006
Appendix:
Example of using the commercial parser "Connexor" http://193.185.105.50/cgi-bin/parser-demo.pl on the following text:
Information retrieval is an important but generally neglected part of the research method in psychology. On the basis of a case study, which consists of an examination of the search strategy in a Swedish dissertation, the problems of searcing are overviewed, with regard to both the selection of sources, and the construction of the search profile. Attention is given to subject faceting in psychology. A model used by Psychological Abstracts in building on the concepts of experimental variables is replaced by a facet model developed on the basis of the Bliss Classification System. This model is illustrated using the above-mentioned dissertation as an example, and it is shown that the model can help in formulating search questions in psychology. Also discussed are problems that concern the use of abstracts or full texts in the selection of documents. In addition, attention is given to the question of types of research in psychology that can benefit from computer-based retrieval methods.
|
Text |
Baseform |
Phrase syntax and part-of-speech |
|
Information |
information |
premodifier, noun, noun phrase begins |
|
retrieval |
retrieval |
nominal head, noun, noun phrase ends |
|
is |
be |
main verb, indicative present |
|
an |
an |
premodifier, determiner |
|
important |
important |
premodifier, adjective, noun phrase begins |
|
but |
but |
coordination marker, noun phrase continues |
|
generally |
generally |
premodifier, adverb, noun phrase continues |
|
neglected |
neglected |
premodifier, adjective, noun phrase continues |
|
part |
part |
nominal head, noun, noun phrase continues |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
the |
the |
premodifier, determiner, noun phrase continues |
|
research |
research |
premodifier, noun, noun phrase continues |
|
method |
method |
nominal head, noun, noun phrase ends |
|
in |
in |
preposed marker, preposition |
|
psychology |
psychology |
nominal head, noun, single-word noun phrase |
|
. |
. |
sentence boundary |
|
On |
on |
preposed marker, preposition |
|
the |
the |
premodifier, determiner |
|
basis |
basis |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
a |
a |
premodifier, determiner, noun phrase continues |
|
case |
case |
premodifier, noun, noun phrase continues |
|
study |
study |
nominal head, noun, noun phrase ends |
|
, |
, |
|
|
which |
which |
nominal head, pro-nominal |
|
consists |
consist |
main verb, indicative present |
|
of |
of |
preposed marker, preposition |
|
an |
an |
premodifier, determiner |
|
examination |
examination |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
the |
the |
premodifier, determiner, noun phrase continues |
|
search |
search |
premodifier, noun, noun phrase continues |
|
strategy |
strategy |
nominal head, noun, noun phrase ends |
|
in |
in |
preposed marker, preposition |
|
a |
a |
premodifier, determiner |
|
Swedish |
Swedish |
premodifier, adjective, noun phrase begins |
|
dissertation |
dissertation |
nominal head, noun, noun phrase ends |
|
, |
, |
|
|
the |
the |
premodifier, determiner |
|
problems |
problem |
nominal head, plural noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
searcing |
searcing |
nominal head, noun, noun phrase ends |
|
are |
be |
auxiliary verb, indicative present |
|
overviewed |
overview |
main verb, participle perfect |
|
, |
, |
|
|
with |
with |
preposed marker, preposition |
|
regard |
regard |
nominal head, noun, single-word noun phrase |
|
to |
to |
postmodifier, preposition |
|
both |
both |
nominal head, pro-nominal |
|
the |
the |
premodifier, determiner |
|
selection |
selection |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
sources |
source |
nominal head, plural noun, noun phrase ends |
|
, |
, |
|
|
and |
and |
coordination marker |
|
the |
the |
premodifier, determiner |
|
construction |
construction |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
the |
the |
premodifier, determiner, noun phrase continues |
|
search |
search |
premodifier, noun, noun phrase continues |
|
profile |
profile |
nominal head, noun, noun phrase ends |
|
. |
. |
sentence boundary |
|
Attention |
attention |
nominal head, noun, single-word noun phrase |
|
is |
be |
auxiliary verb, indicative present |
|
given |
give |
main verb, participle perfect |
|
to |
to |
preposed marker, preposition |
|
subject |
subject |
main verb, infinitive |
|
faceting |
facet |
main verb, participle progressive |
|
in |
in |
preposed marker, preposition |
|
psychology |
psychology |
nominal head, noun, single-word noun phrase |
|
. |
. |
sentence boundary |
|
A |
a |
premodifier, determiner |
|
model |
model |
nominal head, noun, single-word noun phrase |
|
used |
use |
main verb, participle perfect |
|
by |
by |
preposed marker, preposition |
|
Psychological |
Psychological |
premodifier, proper noun, noun phrase begins |
|
Abstracts |
Abstract |
nominal head, plural proper noun, noun phrase ends |
|
in |
in |
preposed marker, preposition |
|
building |
building |
nominal head, noun, single-word noun phrase |
|
on |
on |
preposed marker, preposition |
|
the |
the |
premodifier, determiner |
|
concepts |
concept |
nominal head, plural noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
experimental |
experimental |
premodifier, adjective, noun phrase continues |
|
variables |
variable |
nominal head, plural noun, noun phrase ends |
|
is |
be |
auxiliary verb, indicative present |
|
replaced |
replace |
main verb, participle perfect |
|
by |
by |
preposed marker, preposition |
|
a |
a |
premodifier, determiner |
|
facet |
facet |
premodifier, noun, noun phrase begins |
|
model |
model |
nominal head, noun, noun phrase ends |
|
developed |
develop |
main verb, indicative past |
|
on |
on |
preposed marker, preposition |
|
the |
the |
premodifier, determiner |
|
basis |
basis |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
the |
the |
premodifier, determiner, noun phrase continues |
|
Bliss |
Bliss |
premodifier, proper noun, noun phrase continues |
|
Classification |
Classification |
premodifier, proper noun, noun phrase continues |
|
System |
System |
nominal head, proper noun, noun phrase ends |
|
. |
. |
sentence boundary |
|
This |
this |
premodifier, pro-nominal |
|
model |
model |
nominal head, noun, single-word noun phrase |
|
is |
be |
auxiliary verb, indicative present |
|
illustrated |
illustrate |
main verb, participle perfect |
|
using |
use |
main verb, participle progressive |
|
the |
the |
premodifier, determiner |
|
above-mentioned |
above mentioned |
premodifier, adjective, noun phrase begins |
|
dissertation |
dissertation |
nominal head, noun, noun phrase ends |
|
as |
as |
preposed marker, preposition |
|
an |
an |
premodifier, determiner |
|
example |
example |
nominal head, noun, single-word noun phrase |
|
, |
, |
|
|
and |
and |
coordination marker |
|
it |
it |
nominal head, pro-nominal |
|
is |
be |
auxiliary verb, indicative present |
|
shown |
show |
main verb, participle perfect |
|
that |
that |
preposed marker, clause marker |
|
the |
the |
premodifier, determiner |
|
model |
model |
nominal head, noun, single-word noun phrase |
|
can |
can |
auxiliary verb, indicative present |
|
help |
help |
main verb, infinitive |
|
in |
in |
preposed marker, preposition |
|
formulating |
formulate |
main verb, participle progressive |
|
search |
search |
premodifier, noun, noun phrase begins |
|
questions |
question |
nominal head, plural noun, noun phrase ends |
|
in |
in |
preposed marker, preposition |
|
psychology |
psychology |
nominal head, noun, single-word noun phrase |
|
. |
. |
sentence boundary |
|
Also |
also |
adverbial head, adverb |
|
discussed |
discuss |
main verb, participle perfect |
|
are |
be |
main verb, indicative present |
|
problems |
problem |
nominal head, plural noun, single-word noun phrase |
|
that |
that |
nominal head, pro-nominal |
|
concern |
concern |
main verb, indicative present |
|
the |
the |
premodifier, determiner |
|
use |
use |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
abstracts |
abstract |
nominal head, plural noun, noun phrase ends |
|
or |
or |
coordination marker |
|
full |
full |
premodifier, adjective, noun phrase begins |
|
texts |
text |
nominal head, plural noun, noun phrase ends |
|
in |
in |
preposed marker, preposition |
|
the |
the |
premodifier, determiner |
|
selection |
selection |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
documents |
document |
nominal head, plural noun, noun phrase ends |
|
. |
. |
sentence boundary |
|
In |
in |
preposed marker, preposition |
|
addition |
addition |
nominal head, noun, single-word noun phrase |
|
, |
, |
|
|
attention |
attention |
nominal head, noun, single-word noun phrase |
|
is |
be |
auxiliary verb, indicative present |
|
given |
give |
main verb, participle perfect |
|
to |
to |
preposed marker, preposition |
|
the |
the |
premodifier, determiner |
|
question |
question |
nominal head, noun, noun phrase begins |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
types |
type |
nominal head, plural noun, noun phrase continues |
|
of |
of |
postmodifier, preposition, noun phrase continues |
|
research |
research |
nominal head, noun, noun phrase continues |
|
in |
in |
postmodifier, preposition, noun phrase continues |
|
psychology |
psychology |
nominal head, noun, noun phrase ends |
|
that |
that |
nominal head, pro-nominal |
|
can |
can |
auxiliary verb, indicative present |
|
benefit |
benefit |
main verb, infinitive |
|
from |
from |
preposed marker, preposition |
|
computer-based |
computer based |
premodifier, adjective, noun phrase begins |
|
retrieval |
retrieval |
premodifier, noun, noun phrase continues |
|
methods |
method |
nominal head, plural noun, noun phrase ends |
|
. |
. |
sentence boundary |