EUROVOC is used for indexing:
The purposes of the Eurovoc Project were:
1. Application of EUROVOC in the Library System TINLIB
Practically immediately after the completion of the translation we have begun working on the implementation of EUROVOC into the library system, TINLIB, which is used by our library. In cooperation with the Information Technologies Department of the Czech Parliament the issue of re-indexing the records indexed by the previous version of thesaurus and the creation of programming tools for partial automatic re-indexing are currently being addressed.
In the previous version of our thesaurus, German, English and French terminology acted as non-descriptors vis-a-vis Czech descriptors. The application of a new version will make full-value searches in one of these languages possible for the out-of-state users of our library database.
2. Application of EUROVOC in the Fulltext of the Czech Parliament
Another possibility of exploiting EUROVOC within the Information System of the Czech Parliament is being currently tested in collaboration with the Information Technologies Department. EUROVOC (or 9 languages plus Czech, to be exact) has been experimentally incorporated into a full-text searching of Parliamentary documents. Here the multilingual thesaurus should play the role of a query-formulation supporting tool for the databases of Chamber of Deputies materials which are available for the public via the Internet. The pertinent documents, however, have not been indexed with thesaurus terms yet so that EUROVOC has a limited, auxiliary capacity only. Indexing of all Parliamentary materials is planned in future.
An examle of this thesaurus application:
If a user wants to search in the "Documents of the Chamber of Deputies" database he starts at the appropriate page:
There is a possiblility to choose a full-text searching form, either simple, simple with file searching, or advanced. Clicking the advanced searching form a user gets an opportunity to formulate a comlicated query. He may select a source database and put down his query, or he may use EUROVOC. Especially for foreign language speaking users there is a possibility to consult Multilingual EUROVOC.
When searching in EUROVOC, the user may select one out of 10 languages and type an appropriate term or a string in the search window. The result then is a list of all the terms (descriptors and non-descriptors) which correspond to the search string input. The appropriate descriptor paragraph is then obtained by clicking any of the terms listed. The user may mark any terms of interest and send them by using the "generate query" command to a full-text query. In this process, foreign-language terms are automatically translated into Czech. A query in the query window may be further adjusted following the common rules.
To provide deeper insight in the problem let us deal with the types of thesauri and possibility of full-text searching independently.
Firstly, conventional thesaurus according to the norm ISO 5964 definition is a controled structured vocabulary used for indexing and retrieving documents. With spreading the full-text databases the searching (or search-aid) thesarus came into using. This type of thesauri are not used for indexing but only for retrieving the information. The most primitive form can be named "advice-giving" level searching with thesaurus term support. In this case a user simply pick up the term from the thesaurus and put it to the query statement. More advanced possibility is semiauthomatic or authomatic consultation of the thesaurus. In the first one the thesaurus authomaticly suggests appropriate terms to the query and the user must approves them, in latter case (authomatic consultation) thesaurus (on basis any inference mechanism) supports the user by fully automatic expanding of the query.
When searching the full-text is needed, we can consider two levels: we work with full-text directly - with the help of an authomatic text analysis including statistical, syntactic or semantic level or we do any intelectual compression of basic full-text. Knowledge base and semantic networks are appropriate ways to handle the full-text search, but only in certain range - in case of limited size database that are devoted to the special aims. Also developments are made more on theorethical rather than methodological or practical level in this fields. So good familiar thesauri came in the consideration in full-text searching.
From semantic point of view, we can identify two component of the text: the rheme, that is - what the concrete text predicates (or what the text is saying), and the topic, that is - about what the text predicates (or about what the text is saying). We have no possibility to search the topic in case of free-text searching without any controled vocabulary and intelectual analysis. Wa can only talk about quasitopic searching (for example in case of system TOPIC).
In conclusion, indexing provides the topic searching and searching thesaurus improves the full-text searching significantly, above all for recall, less significantly for precision. In any case, searching thesaurus improves overall retrieval system perfomance.
In talking about the full-text and thesauri we could not neglect the problem of authomatic indexing. Like in the field of authomathic translation the development tends to the semiauthomatic indexing not to the fully authomatic one. Therefore the machine-aided indexing (MAI) came into consideration. MAI techniques provide a basic morphological and syntactical analysis of the full-text (or abstracts) as basis for indexer decision making and intelectual (human) indexing.
Lastly, basic consideration about using the thesaurus in the full-text was made in the presentation. There are many another problems we have to solve, for example profound document analysis, software maintenance or project organization. However we hope the successful appliciation of the Eurovoc thesaurus will be usefull at least from three aspects: the integral information retreival language will be applied in the databases, there will be compatibilty with some databases of European Community in which the Eurovoc thesaurus is used and finaly, the multilingual access to the databases of the Czech Parliament will be provided.
References:
1. FIDEL, R. Searchers' selectin of search keys : II. Controlled vocabulary or free-text searching. Journal od the American Society for Information Science, 1991, Vol. 42, No. 7, pp. 501-514.
2. JONES, S. et al. Interactive thesaurus navigation : intelligence rules OK? Journal od the American Society for Information Science, 1995, Vol. 46, No. 1, pp. 52-59.
3. KRISTENSEN, J. Expanding end-user' query statements for free text searching with a search-aid thesaurus. Information processing and Management, 1993, Vol. 29, No. 6, pp. 733-744.
4. KRISTENSEN, J., JARVELIN, K. The effectiveness of a searching thesaurus in free-text searching in a full-text database. International Classification, 1990, Vol. 17, No. 2, pp. 77-84.
5. MILSTEAD, J. L. Invisible thesauri : the year 2000. Online & CD-ROM Review, 1995, Vol. 19, No. 2, pp. 93-94.