free pdf ebook

home // Computers and Internet // Link Popularity

pdf download Web-Trec 9 and Link Popularity


 
Web-Trec 9 and Link Popularity Source: trec.nist.gov
File Size: 47.83 KB
Category: Computers and Internet
Last Download : 6 days 23 hours 47 minutes ago

Share this info:
Bookmark and Share
Click Image to enlarge

Short Description: About the using of Link Popularity in Web Track 9 datasets.

Content Inside: Report on the TREC-9 Experiment: Link-Based Retrieval and Distributed Collections Jacques Savoy, Yves Rasolofo Institut interfacultaire d'informatique Universit de Neuchtel (Switzerland) E-mail: {Jacques.Savoy, Yves.Rasolofo}@unine.ch Web page: http://www.unine.ch/info/ experiments were conducted on an Intel Pentium Summary III/600 (memory: 1 GB, swap: 2 GB, disk: 6 x 35 The web and its search engines have resulted in a GB) and all experiments were fully automated. new paradigm, generating new challenges for the IR community which are in turn attracting a growing 1. Distributed collections interest from around the world. The decision by To evaluate the retrieval effectiveness of various NIST to build a new and larger test collection based merging strategies, we formed four separate sub- on web pages represents a very attractive initiative. collections (see Appendix 1). In this study, we This motivated us at TREC-9 to support and assumed that each sub-collection used the same in- participate in the creation of this new corpus, to dexing schemes and retrieval procedures. A distri- address the underlying problems of managing large buted context such as this more closely reflects local text collections and to evaluate the retrieval effective- area networks or search engines available on the ness of hyperlinks. Internet than the meta search engines, where different In this paper, we will describe the results of our search engines may collaborate to respond to a given investigations, which demonstrate that simple raw user request [Le Calv 00], [Selberg 99]. score merging may show interesting retrieval perfor- The following characteristics would more mances while the hyperlinks used in different search precisely identify our approach. A query was sent to strategies were not able to improve retrieval effective- all four text databases (no selection procedure were ness. applied) and according to the four ranked lists of items produced, our search system merged them Introduction into a single result list presented to the user Due to the huge number of pages and links, (Section 1.2). Before we describe the collection browsing cannot be viewed as an adequate searching merging approaches, Section 1.1 will identify process, even with the introduction of tables of con- retrieval effectiveness measures achieved by various tents or other classifying lists (e.g., Yahoo!). As a search models with the whole collection and with result, effective query-based mechanisms for accessing each of our four sub-collections. information will always be needed. Search engines currently available on the web are not able to ade- 1.1. Performance of sub-collections quately access all available information [Lawrence From the original web pages, we retained only 99], as they are inhibited by many drawbacks the following logical sections: <TITLE>, <H1>, [Hawking 99]. <CENTER>, <BIG>, with the most common tags In the first chapter, we will describe our experi- <P> (or <p>, together with </P>, </p>) being re- ments on the web track in which a large web text col- moved. Text delimited by the tags <DOCHDR>, lection is divided into four sub-collections in order to </DOCHDR> were also removed. For long keep inverted file size below the 2 GB limit. The requests, various insignificant keywords were also second chapter will verify whether or not hyperlinks removed (such as "Pertinent documents should improved retrieval effectiveness based on four different include ..."). Moreover, search keywords appearing link-based search models. in the Title part of the topics were considered to have a term frequency of 3 (this feature has no To evaluate our hypothesis, we used the SMART impact on short requests). system as a test bed for implementing the OKAPI probabilistic model [Robertson 95]. This year our

download pdf for free Download This PDF File

Sponsored Links

Related Search Terms:

Related PDF Files