Introduction
In this post I will show how to run an R script from Power BI which will execute an Association rule learning script to perform market basket analysis.In this example we will not look at products sold, but products sharing shelf space.
The dataset
Our basic dataset looks like this.Our products:
The distribution / presence of products on the shelf of a customer:
The Power BI building blocks
The data model
As for the DAX part we will start with this post of Marco Russo and Alberto Ferrari.So the data model in Power BI looks like this:
The R visualization
We will look at the DAX part later on. First we add an R component with a script that will return the AR rules it found.The table contains the basic output that is to be expected from AR. We will try to build these measures in DAX later on.
The R script
As for the R script it looks like this:Unfortunately Power BI initializes a new R sessions each time the R visualization is run / cross filtered. Therefore I tried to use a much base R as possible. As for the libraries that need to be loaded. I put these in a separate folder on my local drive and specified the folder name in the library command.
save(dataset, file="C:/TFS/dataset.rda")
library(arules, lib.loc="C:/TFS/Rlib/a/" , logical.return = FALSE,
warn.conflicts = F, quietly = T,verbose = F)
library(plotrix, lib.loc="C:/TFS/Rlib/p/" , logical.return = FALSE,
warn.conflicts = F, quietly = T,verbose = F)
dataset = cbind(dataset, 1)
colnames(dataset) = c("ProductID", "CustomerID", "Waarde")
reports = xtabs(Waarde~CustomerID+ProductID, data=dataset)
reports[is.na(reports)] <- 0
rules <- apriori(as.matrix(as.data.frame.matrix(reports)),parameter = list(supp = 0.03, conf = 0.5, target = "rules"))
t = inspect(head(sort(rules, by ="support"),15))
par(mar = c(0,0,0,0))
plot(c(0, 0), c(0, 0))
if (is.null(t)) {
t = data.frame("no rules found")
text(x = 0.5, y = 0.5, paste("No Rules found"),
cex = 1.6, col = "black")
} else {
addtable2plot(-1, -1, t, bty = "n", display.rownames = F, hlines = F,
vlines = F)
}
Building it in DAX
Support
The output of the arules R script can be built in DAX whenever it concerns single item combinations, so X -> Y. So not A, B -> Y. The 'support' measure is basically the '[Orders with Both Products %]' described by Russo and Ferrari. Just to show how its implemented on our dataset.The building blocks of this formula:
Customers with Both Products % =
IF (
NOT ( [SameProductSelection] );
DIVIDE ( [Customers with Both Products]; [Unique Customers All] )
)
Same product selection, since this is useless.
| |
Customers with both products:
| |
Number of customers in total: Unique Customers All = |
Confidence
Unique Customers LHS:
Confidence = [Customers with Both Products] / [Unique Customers LHS]
Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])
Lift
Lift = [Confidence] / [Proportion Product RHS]
Proportion product RHS:
Proportion Product RHS = Distribution[Unique Customers RHS] / [Unique Customers All]
Unique customer RHS:
Unique Customers RHS =
CALCULATE (
DISTINCTCOUNT ( Distribution[Customer ID] );
CALCULATETABLE (
SUMMARIZE ( Distribution; Distribution[Customer ID] );
ALL ( Products );
USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
); ALL(Products)
)
You can download the Power BI file here.
In this video you see the Power BI file in use:
Hi Stev
ReplyDeleteWe need the XL sheet to upload it into the Power Bi file you publish it on this site.
Yes please publish the data !! how you do example without data.
DeleteWe cannot practices your example in Power Bi cause we have no data , so please try to publish the source data as soon as possible we need it.
ReplyDeleteThank You Dear , But it is less useful without data , In addition it's not dynamic enough to be fit with other type of data e.g.: Big Order details.
ReplyDeleteHow we can compare multiple product, {Product A, Product B} => Product C
ReplyDeletehow to get shift, lift & confidence for it?
implementation of above using DAX
ReplyDeleteUnfortunately its not possible to do this analysis with multiple products in DAX. It is of course possible in R.
ReplyDeleteThanks for your reply
DeleteNumber of customers in total: Unique Customers All =CALCULATE ( DISTINCTCOUNT ( Distribution[Customer ID] ); ALL ( Products ))
ReplyDeleteUnique Customers LHS: Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])
What is difference between both above DAX please explain?