Thursday, June 9, 2016

Market Basket Analysis (Association Rule Learning) with Power BI (DAX) and R

Introduction

In this post I will show how to run an R script from Power BI which will execute an Association rule learning script to perform market basket analysis.

In this example we will not look at products sold, but products sharing shelf space.

The dataset

Our basic dataset looks like this.

 Our products:

The distribution / presence of products on the shelf of a customer:

The Power BI building blocks

The data model

As for the DAX part we will start with this post of Marco Russo and Alberto Ferrari.

So the data model in Power BI looks like this:



The R visualization

We will look at the DAX part later on. First we add an R component with a script that will return the AR rules it found.


The table contains the basic output that is to be expected from AR. We will try to build these measures in DAX later on.

The R script

As for the R script it looks like this:


   
save(dataset, file="C:/TFS/dataset.rda")

library(arules, lib.loc="C:/TFS/Rlib/a/" , logical.return = FALSE,
warn.conflicts = F, quietly = T,verbose = F)
library(plotrix, lib.loc="C:/TFS/Rlib/p/" , logical.return = FALSE,
warn.conflicts = F, quietly = T,verbose = F)

dataset = cbind(dataset, 1)
colnames(dataset) = c("ProductID", "CustomerID", "Waarde")
reports = xtabs(Waarde~CustomerID+ProductID, data=dataset)
reports[is.na(reports)] <- 0
rules <- apriori(as.matrix(as.data.frame.matrix(reports)),parameter = list(supp = 0.03, conf = 0.5, target = "rules"))
t = inspect(head(sort(rules, by ="support"),15))

par(mar = c(0,0,0,0))
plot(c(0, 0), c(0, 0))
if (is.null(t)) {
t = data.frame("no rules found")
text(x = 0.5, y = 0.5, paste("No Rules found"),
cex = 1.6, col = "black")
} else {
addtable2plot(-1, -1, t, bty = "n", display.rownames = F, hlines = F,
vlines = F)
}

Unfortunately Power BI initializes a new R sessions each time the R visualization is run / cross filtered.  Therefore I tried to use a much base R as possible. As for the libraries that need to be loaded. I put these in a separate folder on my local drive and specified the folder name in the library command.

Building it in DAX

Support

The output of the arules R script can be built in DAX whenever it concerns single item combinations, so X -> Y. So not A, B -> Y.  The 'support' measure is basically the '[Orders with Both Products %]' described by Russo and Ferrari. Just to show how its implemented on our dataset.
  
Customers with Both Products % =
IF (
NOT ( [SameProductSelection] );
DIVIDE ( [Customers with Both Products]; [Unique Customers All] )
)
The building blocks of this formula:
Same product selection, since this is useless.
  
SameProductSelection =
IF (
HASONEVALUE ( Products[ID] )
&& HASONEVALUE ( 'Filter Products'[ID] );
IF (
VALUES ( Products[ID] )
= VALUES ( 'Filter Products'[ID] );
TRUE
)
)
Customers with both products:
   
Customers with Both Products =
CALCULATE (
DISTINCTCOUNT ( Distribution[Customer ID] );
CALCULATETABLE (
SUMMARIZE ( Distribution; Distribution[Customer ID] );
ALL ( Products );
USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
)
)
Number of customers in total:
Unique Customers All = 
CALCULATE (
DISTINCTCOUNT ( Distribution[Customer ID] );
ALL ( Products )
)

Confidence

   
Confidence = [Customers with Both Products] / [Unique Customers LHS]
Unique Customers LHS:
   
Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])

Lift



Lift = [Confidence] / [Proportion Product RHS]

Proportion product RHS:

Proportion Product RHS = Distribution[Unique Customers RHS] / [Unique Customers All]

Unique customer RHS:
 
Unique Customers RHS =
CALCULATE (
DISTINCTCOUNT ( Distribution[Customer ID] );
CALCULATETABLE (
SUMMARIZE ( Distribution; Distribution[Customer ID] );
ALL ( Products );
USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
); ALL(Products)
)

You can download the Power BI file here.

In this video you see the Power BI file in use:

9 comments:

  1. Hi Stev

    We need the XL sheet to upload it into the Power Bi file you publish it on this site.

    ReplyDelete
    Replies
    1. Yes please publish the data !! how you do example without data.

      Delete
  2. We cannot practices your example in Power Bi cause we have no data , so please try to publish the source data as soon as possible we need it.

    ReplyDelete
  3. Thank You Dear , But it is less useful without data , In addition it's not dynamic enough to be fit with other type of data e.g.: Big Order details.

    ReplyDelete
  4. How we can compare multiple product, {Product A, Product B} => Product C
    how to get shift, lift & confidence for it?

    ReplyDelete
  5. Unfortunately its not possible to do this analysis with multiple products in DAX. It is of course possible in R.

    ReplyDelete
  6. Number of customers in total: Unique Customers All =CALCULATE ( DISTINCTCOUNT ( Distribution[Customer ID] ); ALL ( Products ))
    Unique Customers LHS: Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])
    What is difference between both above DAX please explain?

    ReplyDelete