ISSN (Online) : 2456 - 0774

Email : ijasret@gmail.com

ISSN (Online) 2456 - 0774


Smart Crawler: A Two-Stage Crawler ForEfficiently Harvestingdeep Web Interfaces

Abstract

as we know that web grows at a very quick

speed, so there has been increased interest in procedures

that help efficiently localize deep-web interfaces. The

deep Web, i.e., contents unseen behind HTML forms,

has long been recognized as a notable gap in search

engine coverage. Later it speaks to an general segment of

structured data on the net, retrieving to Deep-Web

content has been a long-standing challenge for the

database community [1]. The fast development of World-

Wide Web poses phenomenal scaling difficulties for

universally useful crawlers and web search engines.

Though, due to the large quantity of web capitals and the

lively nature of deep web, achieving wide coverage and

very high efficiency is challenging problem. We propose

two-stage framework, namely Smart Crawler, for

effective harvesting deep web interfaces, both stages

performs the different procedures[2].In the first stage,

Smart Crawler achieves site-based searching for center

pages with the help of search engines, for escaping

visiting a large number of pages. To achieve more

accurate results for a focused crawl, Smart Crawler

grades websites to arrange highly appropriate ones for a

given topic which is demanded by the user. In the second

stage, Smart Crawler achieves fast in-site searching by

mining most relevant links with an adaptive link-ranking

[3]. To eliminate preference on visiting some highly

relevant links in hidden web directories, we design a link

tree data structure to achieve wider coverage for a

website or the URL given.

Our results on a set of representative domains

show the agility and accuracy of the proposed crawler

framework. This Smart Crawler efficiently retrieves

deep-web interfaces from large-scale sites and realizes

higher harvest rates than other crawlers.

Keywords:- Smart crawler, Site-locating, In-site exploring

,classification, Ranking.

Full Text PDF

IMPORTANT DATES 

Submit paper at ijasret@gmail.com

Paper Submission Open For March 2024
UGC indexed in (Old UGC) 2017
Last date for paper submission 30th March, 2024
Deadline Submit Paper any time
Publication of Paper Within 01-02 Days after completing all the formalities
Paper Submission Open For Publication /online Conference 
Publication Fees  
Free for PR Students