Market Basket Analysis algorithms with MapReduce

Publisher: John Wiley & Sons Inc

E-ISSN: 1942-4795|3|6|445-452

ISSN: 1942-4787

Source: WILEY INTERDISCIPLINARY REVIEWS: DATA MINING AND KNOWLEDGE DISCOVERY (ELECTRONIC), Vol.3, Iss.6, 2013-11, pp. : 445-452

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

The MapReduce approach has been popular in computing large scale data since Google implemented its platform on Google Distributed File Systems (GFS) followed by Amazon Web Service (AWS) providing the Apache Hadoop platform in inexpensive computing nodes. Map/Reduce motivates to redesign and convert the existing sequential algorithms to MapReduce as restricted parallel programming so that the paper proposes Market Basket Analysis algorithm with MapReduce as well as apriority property. Two algorithms are proposed by adapting an existing Apriori‐algorithm and building a simple algorithm that sorts data sets and converts it to (key, value) pairs to fit with MapReduce. It is executed on Amazon EC2 Map/Reduce platform. The experimental results show that the Apriori‐algorithm does not perform as well as the simple algorithm. Using the simple algorithm, the code with Map/Reduce increases the performance by adding more nodes, but at a certain point there is a bottleneck that does not allow further performance gain. It is believed that the operations of distributing, aggregating, and reducing data in Map/Reduce, cause the bottleneck. WIREs Data Mining Knowl Discov 2013, 3:445–452. doi: 10.1002/widm.1107This article is categorized under:Algorithmic Development > Association RulesApplication Areas > Business and IndustryFundamental Concepts of Data and Knowledge > Big Data Mining