Performing a Market Basket Analysis (Grocery Store without ItemCount)

Sample Cluster Analysis

Performing a Market Basket Analysis (Grocery Store without ItemCount)

Grocery Store Scenario

Here we have an Excel-based dataset containing information about customers who have shopped in a grocery store. We are trying to detect relationships or associations between specific items in a large "catalog" of objects using market basket analysis. A simple example would be the occurrence of beers and diapers in the same sales transaction. However the real value from association rules analysis is finding connections between seemingly non-intuitive items: it would be indeed surprising (and valuable) to the retailer if it was found that there was a strong association between beer and diapers.

In this simple scenario that there are three customers (John, Mark and Alex) with six different items they purchased. Information about each customer’s purchasing history is included in the dataset as shown in Table 1. In order to produce the result from market basket analysis, we are using the RapidMiner software (www.rapidminer.com). RapidMiner supports many different data mining techniques, but we will focus only on market basket analysis here. Please note that TID and ITEM should be in ‘upper’ case.

Table 1 A data set for Grocery Store

Here are the steps needed to be completed.

Data Files needed: Grocery.xlsx

Before you start the project download this file into a location such as your jump drive or hard drive where it can be imported into your process. File is available on elearn.tnstate.edu

Complete the following steps:

Start the RapidMiner software. After it loads, the welcome page will open as in Fig 1-a, close “Choose a template to start from” window.

RapidMiner – Market Basket Analysis (Grocery Store), Page-1

Fig 1-a Welcome Page

2. On the left hand side of the screen, click on Repositories tab, and will follow by expanding (clicking the plus (+) sign) Samples > processes > 02_Preprocessing. Double-click the 22_Transactional2Basket Process (Figure 2-a). Main Process window will have loaded the skeleton format for the Market Basket Analysis as is shown in Fig 2-b.

Fig 2

3. Our first modification will be to replace the “Retrieve” operator with the “Read Excel” Operator (for “input” the data). Right-click the Retrieve operator and choose “Replace Operator” option then select

Data Access Files Read and select Read Excel as shown in Fig 3. Finally, the model in the process area is updated Read Excel operator. Now, we are able to input the data set for further process.

Fig 3

4. Click on “Read Excel” operator and “Read Excel” is displayed in the Parameters Area as shown in Fig 4-a. We will use the import wizard () with the Read Excel operator to locate and Read the Market Basket file (file name: Grocery.xlsx). Click “Import Configuration Wizard”

5. You should save your work constantly so that you are able to retrieve it next without repeating the entire processes next time. If you haven’t already saved the process, it is important to do so by clicking File ->

“Save Process As” (Fig 5-a) then enter the name of process in the “Repository Browser” window as shown in Fig 5-b. Click OK when the name of process has been entered.

6. Now, we are ready to produce the “support” information about the items (individual item and items together) purchased by the three customers. Click on the FPGrowth operator, and assure that the

“positive value” field is set to ‘true’ as shown in Fig 6-a. Note that if the choice is not available, you should click ‘… hidden export parameters’ as shown in Fig 6-b.

Fig 6-a    Fig 6-b

7. A breakpoint is automatically put on the Example2PivotingAttribute, which stops the process execution at that point. This can be removed by right clicking the operator and clicking (to unselect) “Breakpoint After” shown in Fig 7. Click the green “Play” (Run) button to start the process and the result should be produced and displayed on the Result View.

Breakpoint Symbol

Enable or disable

Breakpoint After

Fig 7

8. The Example2PivotingAttribute process transforms an example set by grouping multiple examples of single units to single examples. By clicking Data View as was done in Fig 8-a, we see that the data has been grouped together by Transaction Analysis. In the resulting table, question marks mean that said item has not been purchased in that transaction. In the AttributeSubsetPreprocessing we will substitute the question marks with the word “false.” (Fig 8-b)

Fig 8-a

Fig 8-b

9. By clicking the Play button again (Fig 8-b), we will reach the end result of the processes. NOTE: Make sure to choose “No” if asked to “Close old results before starting process.”

10. The results screen shows the support of each individual item, and the support of items occurring together. As seen in Fig 9, the support for item 1 (id_1.0) is 0.667, or 66.7%. Meaning that item 1 is found in 66.7% of the transactions. It means that if there are 100 transactions done today, item 1 is found (included) in almost 67 transactions. Whereas items 2 and 3 (and items 4,5,6) only occur once, so their support is only 33.3%

Fig 9

11. While the previous result shows the support of items occurring together, we need to add an additional operator to be able to create and obtain association rules between the items. As shown in Fig 10-a and 10-b, we must add a “Create Association Rules” operator between the FPGrowth operator and the “res” (result) by clicking “Create Association Rules” (available at Modeling -> Association and Item Set Mining in the Operator Area). However, if we go ahead to RUN the process, an error was detected as shown in Fig 10-c. Therefore, we need re-connect the ports between operators. First, reconnect ‘exa’ (example set) from FPGrowoth directly to ‘res’ and then connect from ‘fre’ in FPGrowth (frequent sets) to ‘ite’ (item sets) in CreateAsociation. Finally, reconnect ‘rul’ (rules) from CreateAssociation to ‘res’ as shown in Fig 10-d and lights turn into yellow.

Fig 10-a

 

Fig. 10-b

Fig. 10-c

 0.5

Fig. 10-d

12. Change the min confidence field from 0.8 to 0.5 (for this example), and click the “PLAY” button. You may need to click ‘RUN” again if the temporary result is displayed as shown in Fig 11-a. Depending on the size of the data, this might take a long time, but it should not take long for this example. The result will appear like what is shown in Fig. 11-b.

Fig 11-a

Fig 11-b

13. The results (Table View) show the association rules created in the form: Premises => Conclusion. So row

no. 1 can be read as: The purchase of item 1 implies the purchase of item 3. These items are bought together in 33% of the transactions, and 50% probability of the times that Item 1 is bought, Item 3 is bought as well. If we look at row no. 4, we see that the purchase of Item 3 implies the purchase of Item 1. Again, this combination happens in 33% of the transactions, but we see that every time Item 3 is bought, Item 1 is bought as well (100% confidence). Finally, if we look at row no. 15, we see that the purchase of Item 1 and Item 3 together implies the purchase of Item 2. This combination happens in 33% of the transactions, but, we see that every time Item 1 and Item 3 are bought, Item 2 is also bought (100% confidence).The text view (AssociatioRules) is illustrated in Fig 11-d.

Fig 11-c    Fig 11-d

RapidMiner – Market Basket Analysis (Grocery Store), Page-10

End Note: If we want to change the value of TID and ITEM from numbers to customer’s name and item’s name,

information stated in step 4 should be changed as follows (the worksheet of “Grocery Data (Name) should be selected in Step 2 of “Data import wizard) as shown in Fig 12-a thru Fig 12-d.

Fig 12-a    Fig 12-b

Fig 12-c    Fig 12-d    Fig 13-a

The results (Table View) with names of customers and items are shown in Fig 13-a and Fig 13-b. We now can re-read row no. 1 as: The purchase of Beers implies the purchase of Diapers. These items are bought together in 33% of the transactions, and 50% probability of the times that Beers is bought, Diapers is bought as well.

Further, if we look at row no. 4, we see that the purchase of Diapers implies the purchase of Beers. Again, this combination happens in 33% of the transactions, but we see that every time Diapers is bought, Beers is bought as well (100% confidence).

Fig 13-b

RapidMiner – Market Basket Analysis (Grocery Store), Page-11

Need a similar essay? We have qualified writers who can assist. Click ORDER NOW to get a special bonus- Up to 18% Discount Offer!!!

find the cost of your paper