Tuesday, February 27, 2007

Good Snowbaord Movies

Processing natural language in the case of Spanish-rules and knowledge base

Author: Comeche Martínez, Juan Antonio (*)
Source: ANABAD Bulletin, 2005, 55 (1-2) : 87-96.

(*) School of Information Science , Biblioteconomíay Documentation Department, Universidad Complutense de Madrid (UCM)

Summary Information Retrieval, a possible area technique to increase the accuracy of the results obtained by the system in response to user demand is to subject the natural language texts in the collection process of unification of the linguistic variations of the words in the language concerned. Such variants can be both inflected language as derivative. The unification of inflectional variants seeks to bring together under a single term indexing words that are distinguished by gender (cat, cat), the number (cat, cats) or by verbal inflection (love, love, loved, had loved ...). The unification of variant derivative, for its part, seeks to bring together under one term all the words that are part of the same semantic field although they have different endings (eg, distress, distressed, distressed, distressing ...).

De Martínez Comeche see also in relation to the previous document:

SREC-I: prototype Intelligent Recovery System [PDF ]
In Documentation of Information Science , Volume 28, 2005

Summary Description of a prototype intelligent retrieval system called SREC-I developed in Prolog. Explains their general characteristics, initially motivated by didactic purpose. Here are the main component modules in Prolog code and two of them.

[Begin text, we have omitted the footnotes to the original page]

Information retrieval, as a study area, has behind it more than forty years of business. Since its inception the primary goal has not changed, so that meet the information needs of users typically show the documents where you find the information sought and automate the process with maximum efficiency and effectiveness, the challenge remains fundamental of the many researchers working in this field.

During these decades many different approaches have been tried, from the traditional Boolean model, vector or probabilistic, to which we can include under the common denomination of techniques artificial intelligence, which include neural networks or genetic algorithms.

Since the first conference TREC (Text REtrieval Conference), held in Gaithersburg, Maryland, between 4 and 6 November 1992, it paid particular attention to the evaluation of systems and specifically the need for systems where test results and test collections of which to compare the improvements achieved with the techniques under consideration. Among the various information retrieval systems (CRS) of free access that will be used SMART and ZPRISE stand.

But to have been designed specifically for evaluation RESEARCH AND decreases, however, its utility from the educational point of view. The test collections that serve as input to these SRI, for example, meet all the documents in a single text file, reserving special characters to indicate the beginning and end of each of them. By contrast, in actual SRI library is not fixed, it is usual that the ups and downs of documents to be constant. In order that students could also see these processes of incorporation and disposal of funds, SREC-I was designed so that each document is stored in a separate file.

On another important note, these systems usually require that decisions on operating parameters (method of calculating the weights of the terms, for example) are taken prior to both the performance No program, so that while executing users / students and can not intervene or are told the specific techniques employed or the values \u200b\u200bof the parameters used by the system at that time. Thanks to this computing time can be compared and evaluated, although the didactic utility is seriously impaired. As initially the prime focus in our case was the teacher, SREC-I was designed in an interactive way: the system warns the student of many of the mistakes made by him to run, he explains the nature and helps to solve without being forced to reboot. In addition, it refers specifically about each of the techniques and parameters that you can take in every moment, which forces the student to be much more aware of the operating mode internal CRS.

These two characteristics mentioned we should add a third but not least: SREC-I was not designed by adopting a specific model or approach when designing recovery informació n. On the contrary, the burden of documentation is done so that any of the three recovery models called classical (Boolean, vector and probabilistic) can in principle be made with the collection, although not yet probabilistic model is deployed in SREC-I. It was decided and at the time thinking that the student may thus become aware of their potential use in the future Shall decide on the systems and collections in their care.

Moreover, since it was initially thought to be adding new modules to the system increase their operating options, mainly from the field of Artificial Intelligence and more specifically to ; area of \u200b\u200bNatural Language Processing. That is why the name was imposed smart (SREC-I) and the reason that has been developed entirely in Prolog, whereas specific capabilities of this language for the development of programs related to the handling of natural language and in general with Artificial Intelligence.

The first such expansion is in advanced stage of development. This is a stemmer for the Castilian hopefully see the light over the next year. It has also been programmed in Prolog and the possibility exists that SREC-I may be used for evaluation. Its use for this purpose does not depend primarily on the implementation of a charging algorithm that supports the test collections existing in a single file.

SREC-I has already been tested during the past academic groups of third year of the Diploma in Biblioteconomíay Documentation (elective course in Advanced Treatment Systems and Information Retrieval) and the second year of the Bachelor of Documentation (in the core subject Advanced Techniques of Information Retrieval), obtaining good reception and generally satisfactory results. This fact encourages me to continue with the project and make improvements to the deficiencies observed.

My intention is to make it available to all teachers who wish to do so through a web portal whose launch is planned for next year. While it is not possible to download the program in its entirety, at least the interested reader may consult the following pages the salient features of the current modules of SREC-I and code PROLOG say two of them, which I believe development of the most difficult and despite numerous existing bibliographical material on Prolog, for although sometimes very valuable, to my knowledge none specifically addresses the creation of code for a SRI in that language.

[Continue ]