Thursday, February 21, 2008

Milani Denise Hard Core

Tutorial on Ontology

Authors: Jesus Contreras, Juan Antonio Martínez Comeche
Source: SEDIC
document Url : http://blog.sedic.es/ ...

Following the workshop on Ontologies and Recovery of information, organized by the group Normaweb of SEDIC , developed in the School of Information Science of the Universidad Complutense de Madrid in September, Jesus Contreras and Juan Antonio Martí nez-Comeche have written a tutorial that summarizes the highlights were analyzed:

  • Ontologies, semantic web informacióny recovery
  • Concept and classification
  • Main advantages and disadvantages Terminology
  • ontology development methodologies
  • editing tools
  • Protégé installation and functioning of
  • Steps for the development of ontologies using Competency Questions

[Start tutorial]

Purpose of the Ontology: Semantic Web

The current web is essentially a huge set of pages that contain unstructured text, ie text whose contents we have not bothered to characterize. Basically we have limited ourselves to reviewing the way it should display such content, as evidenced by the nature of HTML tags . This simplicity has helped, no doubt, the success of the current web and justifies its enormous growth in number of pages and users, but at the same time creates problems and difficulties of managing and retrieving such a huge amount of information.

Human beings are unable to control the information that at a given time may be useful to us concerning an information need among the millions of existing pages on the web, especially when changes are occurring at the same breakneck speed. In fact, it is estimated that 40% of the network is changed monthly. In such circumstances, we have developed search engines that help us decide which pages can include information relevant to a problem either. But since the textual information from the web site is not structured in that it is not described or characterized in some way, the algorithms search engines can only be based on the appearance of the words taken in isolation.

This leads, of course, lack of precision and completeness results. Lack of precision in the results because we are a search engine presents pages that have no relation to our information need. This happens, for example, when words have several meanings. If you look for the word bank will get pages about banks, but tambiéna type of seat. Similarly, the lack of completeness can come caused, among other reasons, the use of a synonym in one page instead of the word used in the query. In this case, the page will not be recovered as it contains strictly the word entered in the search.

addition, search engines provide links to documents that could be useful for the user, but are not able to provide the specific answer you are looking for on many occasions. If a person looks cheaper cars from dealers in a particular geographical area, today the user must take many hours comparing information from various dealers that a search engine has previously provided.

Another problem with the web today is the lack of reliability of sources. The user has no Evidence on the accuracy and reliability of the data contained in the websites retrieved.

Evolution Site designed by Tim Berners-Lee is to solve the problems raised in the preceding paragraphs. Just imagine a website where the content. the site is characterized and described in such a way that is able to discern the different meanings of words, can infer the existence of synonymy relationships between words in a thematic context, so as to be able to recover Pá useful pages regarding the information need of User although they do not appear explicitly entered words in the query, or be able to compare data and information from various sources, make inferences and logical deductions from them to show directly the information you seek , we were (the nearest dealer with the cars cheaper, for example). Even that was capable of making judgments about the reliability of the data in the various sites and consider only the most truthful answer, discarding the least reliable.

Tim Berners-Lee has called Semantic Web Site where applications will be able to make the information processing much deeper. This site is characterized by programs that can "understand" the content of web pages, and therefore, to relate the information contained on today isolated, and processing, to discriminate the most reliable one time, and even deduce or infer information not previously recorded, making decisions with a degree of autonomy.

For these applications and services more "intelligent" as possible is necessary for the information web pages are structured, that is, well described and classified so that its exact meaning is available to machines. Thus computers can handle and process information properly. Hence the name of the Semantic Web.

The way has been devised to encode the meanings of the information contained in web pages is the use of labels that specify the semantic value or the correct interpretation of the contents. Thus, a number may indicate, according to circumstances, a price, a long añoo. Its precise meaning will be specified in each case by the presence of a label.

marking and annotation of web content should be done according to rules and formats, otherwise it would be impossible for the effective manipulation of information by computers. First, a consistent framework involves the pre-structuring of the domain that is represented, describing the main entities that compose it, its nature jerarquíay relations between them. Secondly, you must take care that all users employ accepted formats, because if it exist several sets of tags and is not seeking a method to ensure their joint operation, all efforts would be futile.

Compliance with certain standards needed to develop consistent labeling of web content is the creation of ontologies on the domain or area of \u200b\u200bknowledge that we wish to represent semantically. Consequently, ontologies are the primary means to achieve the goal of the Semantic Web, to facilitate the formal definition of entities and concepts in different domains, the hierarchy that sustains them and the different relationships that bind together. Thus guarantee a formal representation readable by machines based on a common language - XML \u200b\u200b - that can be shared and used by any system automatically.

No less important than the technological challenges and formalism is the challenge of explotacióny use of the semantic web. Making a comparison with the current web, which witnessed its peak as it outlined new business models are outlined here some possibilities and visions on the types of semantic web applications.

The semantic web technology offers the possibility to build content and complete formal semantic models based on consensus. The existence of these models allows the functionalities offered by these systems covering, inter alia, the following applications:

  • Information retrieval by semantic search engines, semantic search, unlike traditional keyword-based, "working with the meaning of words according to the underlying model ensuring 100% accuracy in searches. The result presented to the user becomes the information requested in the form of model concepts, instead of possibly related documents, as do modern browser.
  • Publication of information according to the model. The navegacióny the presentation of information may be made according to their content so that users can visualize the concepts model and query concepts regardless of the documents on the system.
  • The presence of the model allows the incorporation of Intelligent Interfaces such as those based on natural language. The ability to formulate queries in natural language close to ensure the usability of the final system.
  • inference system and completeness of information . On the basis of the axioms of the semantic web models can validate and augment the information by automated inference systems.
  • Information Exchange a specific application formats. The ability to translate the information formats of other applications, such as educational applications, increases the profitability of the coding itself. Currently the cost of doing business in heterogeneous systems is compatible with 30% of all industry spending on information technology.

[Continue tutorial ] ( PDF, 172 Kb)

0 comments:

Post a Comment