Home > About Myself, Technology > Professional Seminar report for Semantic Web

Professional Seminar report for Semantic Web

Research on Semantic Web Mining

Seminar Report

Submitted in partial fulfillment of the requirements for the second semester course










March, 2011


I hereby declare that the Report of the P.G. Seminar Work entitled “Research on Semantic Web Mining” which is being submitted to the National Institute of Technology Karnataka Surathkal, in partial fulfillment of the requirements for the second semester PROFESSIONAL PRACTICES/SEMINAR course of the Master of Technology Degree in Information Technology in the Department of Information Technology, is a bonafide report of the study carried out by me. The material contained in this report has not been submitted to any University or Institution for the award of any degree.


(Register Number, Name and Signature of Student)

Department of Information Technology




This is to certify that the P.G Seminar Report entitled “Research on Semantic Web Mining” submitted by CHITTAMPALLY VASANTH RAJA (Register Number: 10IT05F) as the record of the work carried out by him, is accepted as the P.G Seminar Report submission in partial fulfillment of the requirements for the award of degree of second semester PROFESSIONAL PRACTICES/SEMINAR course of the Master of Technology Degree in Information Technology in the Department of Information Technology, National Institute of Technology Karnataka, Surathkal.

Dr. Prakash Raghavendra

Assistant Professor

Department of Information Technology

NITK Surathkal

Mr. Biju R Mohan

Assistant Professor

Department of Information Technology

NITK Surathkal


I take this opportunity to express my deepest gratitude and appreciation to all those who have helped me directly or indirectly towards the successful completion of this project.

Foremost, I would like to express my sincere gratitude to my guides Dr. Prakash Raghavendra and Mr. Biju R Mohan, Department of Information Technology, NITK Surathkal. Their advice, constant support, encouragement and valuable suggestions throughout the course of my work helped me successfully complete the seminar. Without their continuous support and interest, this report would not have been the same as presented here.

I am thankful to Dr. Ram Mohan Reddy, Head, Department of Information Technology for his co-operation and for providing necessary facilities throughout the course.

Besides my guides, I would like to thank entire teaching and non-teaching staff in the Department of Information Technology, NITK for all their help during my tenure at NITK. Kudos to all my friends at NITK for thought provoking discussion and making stay very pleasant.

Last but not least, I am thankful to my parents to whom I am greatly indebted for their support and encouragement to pursue my interests.



Following the rapid development and wide application of the Internet, Web has become an exchange, sharing of information and effective tool for collaborative work. People’s attention and frequent use of the Web promote the development of this technology, but also make the Web information resources on the rapid growth. However, there are flood of information resources distribute on Web, to Convenient to bring the people at the same time, also makes the network very difficult to in-depth application. On the one hand a person is only concerned about small information of Web, and user is not interested in the rest of the Web. Content mining is used to extract the text, image, or other information and knowledge component of the web content.

A semantic-based Web mining is mentioned by many people in order to improve Web service levels and address the existing Web services which is supported by the lack of semantic problem. Semantic-based Web data mining is a combination of the semantic Web and Web mining. Web mining results help to build the semantic Web. The knowledge of Semantic Web makes Web mining easier to achieve, but also can improve the effectiveness of Web mining.

1.          Introduction

Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: an increasing number of researchers is working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself.

The Semantic Web is the second-generation WWW, enriched by machine-process able information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We can argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized.

1.1 Web Mining and Semantic web-related Knowledge

Web mining Web mining can be generally defined as Extract interested, useful patterns and implicit information from the WWW resources and behavior. In general, Web mining can be divided into three categories: Web content mining, Web structure mining and Web usage mining. Figure 1 shows the classification of Web mining:

Web content mining is used to extract the text, image, or other information and knowledge component of the web content. Which sites sell cars? Which pages are in Chinese? Which pages introduce the music, or introduce news? Search engines, intelligent agents, and some recommend use content mining to help the user in the vast network of space to find the necessary content. Web content mining has two strategies: page text mining; process results for search engine query further to get more accurate and useful information. Web structure mining is used to extract the network topology information, that is, the link between pages of information. Mine knowledge from the WWW organization and links. Which pages are linked to other pages? Which pages point to other page? Which collection of pages constitutes an independent entity? Can sort the page and found that an important page. Web usage mining is used to extract about the customer how to use the browser and use the page links. It extracts interested patterns from the access to records of Web. For example, which pages are the client accesses? How long spent on each page? What next click on? What are the entry and exit routes? WWW Each server retains the Web access log, recording information for the user access and interaction. Analysis of these data can help understand the user’s behavior, thus improving the structure of the site, or to provide users with personalized services.

2.    Semantic Web

The basic idea of Semantic Web is that embed machine-readable, on behalf of certain types of knowledge mark in the Web message. So that the data on the Web is not only used to display, but also be understood by the machine so as to enhance the quality of the information services and explore a variety of new, intelligent information services. If the knowledge that reflect the link between data and application are embedded in a variety of different information sources in a user transparent manner, Web pages, database, procedures will be able to link up through the agent and each other collaborate. According to Berners-Lee’s vision, the semantic network Constituted by seven levels is constituted of a layered architecture. As shown in Table1.

The first layer of URI and Unicode is the basis for the structure of the entire system. Unicode is responsible for processing resources encoding, URI is responsible for resource identification, which allows precise retrieval of information possible. The Second layer of XML + NS (Namespace) + XML Schema, is responsible for representing the content and structure of data from the linguistic to separate the performance format, the data structure and content of the network information form through the use of a standard format language. The third layer of RDF + RDF Schema, which provides a semantic model used to describe the information on the Web and type. The fourth layer of ontology vocabulary layer is responsible for the definition of shared knowledge and describes the semantic relationships between the various kinds of information to reveal the semantic between information itself and information. The fifth layer of logic layer is responsible for providing axioms and inference principles to provide the basis for intelligent services. The sixth layer of Proof and the seventh layer of trust are responsible for providing authentication and trust mechanisms. Digital signatures and encryption technology used to detect changes in the document situation is a mean to enhance Web security. This is a hierarchical structure of the enhanced functional. XML, RDF (S) and the Ontology are its core in the Semantic Web architecture. The formation of the Semantic Web’s technical support system mark with the three core technology. They support semantic description for network information and knowledge, to play a central role in achieving the semantic-level knowledge sharing and knowledge reuse.

Semantic Web is known as Web3.0, it is based on resource description framework RDF to integrate a variety of applications of XML-syntax, uniform resource identifier as naming mechanism. Semantic Web is just an extension of the current Web and is not a new Web. The research focus is how the information can only be changed from the form that a computer can read to the form that a computer can understand and deal with, that is with the semantics, so that

the computer and people can work together. Web resources (such as Web pages, Web service) for the use of ontology annotation terms are an important prerequisite for goal to achieve the semantic Web. Ontology in Tim Berners-Lee proposed the Semantic Web-seven is in the fourth tier architecture, which aims to capture the knowledge in related fields, provides a common understanding of knowledge this area to determine the field of co-sanctioned vocabulary, and to give a clear definition between the words and the interrelationship of words, according to the relationship between the concept to describe the semantics of the concept. Ontology-based semantic annotation using ontology defined by experts support the content creator to add semantic metadata in the Web page, so content can be understood by people and machines, as compared with the general public, this is a marked top-down classification. Semantic Web which can be seen as a new generation of information infrastructure is a new distributed intelligent network platform based on semantic information processing.

2.1 Resource Description Framework (RDF):

RDF documents consist of three types of entities: resources, properties, and statements. Resources may be Web pages, parts or collections of Web pages, or any (real-world) objects which are not directly part of the WWW. In RDF, resources are always addressed by URIs. Properties are specific attributes, characteristics, or relations describing resources. A resource together with a property having a value for that resource forms an RDF statement. A value is a literal, a resource, or another statement. Statements can thus be considered as object–attribute–value triples. The data model underlying RDF is basically a directed labeled graph. RDF Schema defines a simple modeling language on top of RDF which includes classes, is-a relationship between classes and between properties, and domain/range restrictions for properties. RDF and RDF Schema are written in XML syntax, but they do not employ the tree semantics of XML.

XML and XML schema were designed to describe the structure of text documents, like HTML, Word, StarOffice, or LATEX documents. It is possible to define tags in XML to carry meta data but these tags don’t have formally defined semantics and thus their meaning will not be well-defined. It is also difficult to convert one XML document to another one without any additionally specified semantics of the used tags. The purpose of XML is to group the objects of content, but not to describe the content. Thus, XML helps organizing documents by providing a formal syntax

2.2 Web Mining Based Semantic Network

Semantic Mining is a series of semantic analysis of information resources and users’ question by advanced intelligence theory and technology, through mining its deep semantics, in order to fully and accurately to express knowledge resources and user needs, and then in various distributed, heterogeneous databases, data warehouses, Knowledge Base to search, at last, retrieve information in intelligent processing to return the most relevant results of the semantic retrieval mechanism. Semantic-based Web data mining combine Semantic that is extracted from existing Web data extraction or existing Semantic structures with Web Mining. Web mining results help to build the semantic Web, the Semantic Web mining knowledge makes it easier to achieve and improves the effectiveness of Web mining. Corresponds to the Web mining, semantic-based Web Mining can be divided into semantic Web content mining, Semantic Web structure mining and semantic Web usage mining categories.

Semantic Web content and structure mining. In the Semantic Web, content and structure of the tangled, which makes content mining and structure mining differences almost vanished, so we put them here collectively referred to as the semantic Web content and structure mining. Thus, the traditional relevant technical for relational data mining can easily be transferred to the Semantic Web content and structure mining.

2.3 Semantic Web usage mining

In the Semantic Web environment, we can give a clear semantics to user behavior the body of knowledge based on the log file of semantic ontology knowledge. On this basis, excavation shown to be effective in establishing the users gathering in the same interest, which provides users with ontology-based personalized view to improve the Web usage mining results. Agent is an intelligent software entity, which is able to complete spontaneously a specific function and can be related to Agent communications under certain circumstances. Agent is usually autonomous, social, active and passive response to their own adaptability and mobility. Intelligent Agent can complete intelligent reasoning tasks according to the semantic information on Web, and can improve the accuracy of information retrieval. So now Agent technology has been widely used in building an intelligent system. Semantic Web Mining Model under the framework of Agent According to the above-mentioned knowledge, we can create a Semantic Web Mining Model under the framework of Agent to better understand the combination of the semantic network and Web mining techniques. This model creates the whole process from five steps to complete. Semantic Web Mining Model under the framework of Agent is shown in Figure 2,

The first step: In the beginning, you need to build an initial ontology. To build an initial ontology first we need to obtain the relevant set of atomic concepts, we use clustering algorithm to obtain the document from the Web; and then get this concept hierarchy by a variety of different ways. One way is to use the knowledge acquisition methods to generate, such as ONTEX (ontology Exploration) which input a group of concept sets depending on knowledge acquisition techniques of properties detect, and then output the level of above concept collection. Another way also can use many of the ontology models that the current ontology researchers have developed. These include both general knowledge ontology model description and a specific description of knowledge in the field. Ontology model combine knowledge of experts in the field builds a conceptual level (initial ontology). The ontology level will be stored in ontology library system to provide support for the next phase of work.

The second step: resource acquisition module collects task-related data sets according to received tasks instructions by ontology Agent from a Web mining. Usually this step is essential. Because the data set on Web is very scattered, dynamic and often inconsistent data, whether the data collection is good or bad will have a direct impact on the results of Web mining.

The third step: RDF clustering module achieves ontology clustering learning to the data that resource acquisition modules have collected. The resource nodes of closest characteristics will be got together in the RDF data repository.

The fourth step: Data stored in the RDF data repository are mined by Semantic Web Mining module and the mining results are provided to ontology Agent.

The fifth step: Ontology Agent completes semantic filtering and clustering of processing for results obtained by Semantic Web Mining module, to improve the relevance of return information; and also ontology learning can take advantage of the semantic Web mining modules to carry out the expansion and modification of ontology knowledge.


[1] WANG Yong-gui1, JIA Zhen2, ‘Research on Semantic Web Mining’ Dept of Software    Liaoning Technical University Huludao, Liaoning, China, 201O International Conference On Computer Design And Appliations  (ICCDA 2010)

[2] Semantic Web Mining State of the Art and Future Directions Gerd Stumme, Andreas Hotho, Bettina Berendt ECML/PKDD 2004 conference.

  1. No comments yet.
  1. March 31, 2011 at 4:44 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: