Extraction of relationship between web pages and files in access logs

ISSN： 1743-8187

Source： International Journal of Business Intelligence and Data Mining, Vol.7, Iss.3, 2012-10, pp. : 152-171

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Since the Internet is sufficiently established, information on the Web is significantly enriched every day. It induces a fact that the information on Web pages has become increasingly useful in daily life. Therefore, it has become very common for us to refer to information on the Web, particularly when writing documents or programs. If we want to revisit the same Web pages to modify some part of a file later, it can be very hard to track down the Web pages originally referred to. In this paper, we propose methods for extracting relationships between files and Web pages based on the co-occurrence of data in Web-access logs and file-access logs. These relationships are very useful for revisiting Web pages related to target files. There are two approaches for merging the logs to analyse co-occurrence in these two types of access logs, involving a trade-off between accuracy and execution time. We call them the Pre-Merge and Post-Merge methods. We have evaluated these two methods using actual access logs.