Saturday, February 23, 2008

Un Processed Cocoa Powder

The new generation of search engines based on natural language processing

Author: José Ramón Pérez Agüera (*)
Source: IweTel (17/05/2007) / Thinkepi
Url document: http://www.thinkepi.net / ...

(*) Dept. of Software Engineering and Artificial Intelligence, School of Computing, University Complutense of Madrid

In the past year we have seen flourish across a range of new engines which charac common acoustic has been the integration of language processing techniques naturally in the search process.

The two defenses of this new trend are Powerset [1] and Hakia [2], behind which met the creme of the creme Natural Language Processing to achieve a new leap in quality in the evolution of web browsers.

The idea of \u200b\u200bintegrating linguistic knowledge in search engines is not new at all, and since the 90's, if not earlier, have been repeated attempts to implement search engines that go beyond counting maso less complicated frequency words. The most resounding failure in this regard was undoubtedly an attempt to Ellen Voorhees, back in 1993, using Wordnet, a huge database with semantic information, to expand the user queries.

The results of this experiment, as seen in his paper [3] are quite stark, and since then, beyond the specific studies whose results have been inconclusive, the use of natural language RECOVERY No information has been relegated to the rather trivial application of techniques such as stemming and disposal of empty words.

The reason for this new resurgence of the Natural Language search engine environment corresponds in part to a natural cycle, typical of any scientific discipline, where old ideas are tested from new approaches. But it is also a matter of marketing, where new seekers trying to enter the market by selling the idea of \u200b\u200bhaving a revolutionary new technology that far outweigh the current focus of the major search engines.

From the scientific point of view, the lion is not as fierce as they paint, and like Powerset and Hakia are put to work leading researchers in Natural Language Processing, Google, Yahoo and Microsoft have been working also in this direction.

The conclusion we can draw from this is that although the inclusion of natural language search engines is without doubt one of the lines of future work to improve not only the quality of the results seekers but also their ability and interaction with users, much remains to be done in this regard and is rare that no new browser to unseat Google simply because it uses natural processing techniques.

In this regard, we must be aware that the marriage for money on Google in 1998 was related to entry into a virtually untouched market backed by strong economic investment with a decisive technological advantage because without underestimating the importance of PageRank is important to remember that they were not the only ones who used a link analysis algorithm.

Despite all this, it is worth continuing the progress made in this regard, both those who come from overseas, as they develop here in Spain, for example, caused by companies like Bitext, lest one day we surprised by the wonders that are language able to make the American search engines without knowing that we have around here is a English company that makes possible such wonders.

[1] http://www.hakia.com

[2] http://www.powerset.com

[3] Voorhees, EM 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in information Retrieval (Pittsburgh, Pennsylvania, United States, June 27 - July 01, 1993). R. Korfhage, E. Rasmussen, and P. Willett, Eds. SIGIR '93. ACM Press, New York, NY, 171-180. DOI = http://doi.acm.org/10.1145 / 160688.160715

0 comments:

Post a Comment