Wednesday 25 September 2019

Design and Development of Feature Based Similarity Measure Crawling Algorithm: An Approach to Text Mining

Volume 12 Issue 3 January - March 2018

Research Paper

Design and Development of Feature Based Similarity Measure Crawling Algorithm: An Approach to Text Mining

0*, Sanjay Mate**, M.M. Raghuwanshi***
* Lecturer, Department of Computer Engineering, Government Polytechnic Daman, UT of Daman and Diu, India.
** Lecturer, Department of Information Technology, Government Polytechnic Daman, UT of Daman and Diu, India.
*** Professor, Department of Computer Technology, Yashwantrao Chauhan College of Engineering, Nagpur, Maharashtra, India.
Dahiwale, P., Mate, S., and Raghuwanshi, M, M. (2018). Design and Development of Feature Based Similarity Measure Crawling Algorithm: An Approach to Text Mining. i-manager's Journal on Software Engineering, 12(3), 1-7. https://doi.org/10.26634/jse.12.3.14554

Abstract

The speed at which World-Wide-Web (WWW) spreads its division from an insubstantial number of web-pages to an enormous amount of web information, progressively improves web crawling complications in a search engine. A search engine controls a set of queries from varying parts of the world, and its satisfaction depends only on the knowledge that it collects by means of crawling. The most general habit of the society is information distribution, and it is done by means of publishing prearranged, semi-structured, and amorphous reserve on the web ( Nandy et al., 2012). This social practice directs to an exponential expansion of web-resource, and hence it became necessary to crawl for non-stop updating of web-knowledge and variations of some presented sources in any condition. This paper proposes feature based crawling algorithm for lightweighted and efficient crawling. The scaling technique is used to evaluate the performance of proposed method with the standard crawler. A great speed presentation is observed after scaling, and the extract of related web-source in such an extreme speed is examined.

No comments:

Post a Comment