Sunday, August 20, 2017

Plug and Play formula for correlation in DAX

As you might have seen, I wrote a blog article to calculate correlation in DAX. This was done by defining multiple measures that were used in the final correlation formula. The drawback of this approach was that it was quite time consuming to create a new correlation formula with other columns, the number of calculated measures increased and performance was not optimal.

I tried to tacle the above drawbacks and this resulted in the following formula:


Correlation = 
var cv = VALUES ( Data[Perioden] )
var cas = ALLSELECTED ( Data[Perioden] )
var x_sd = CALCULATE (STDEVX.P ( cv, [x]), cas )
var y_sd = CALCULATE (STDEVX.P ( cv, [y]), cas )
var SD_Product = CALCULATE ( x_sd * y_sd )
var x_mean = CALCULATE ( AVERAGEX ( cv, [x] ), cas )
var y_mean = CALCULATE (AVERAGEX ( cv, [y] ), cas )
var vDiff_Mean_Product =
CALCULATE ( AVERAGEX (
cv,
( [y] - y_mean ) * ([x] - x_mean )
),
cas
)
return
CALCULATE ( vDiff_Mean_Product / SD_Product)

All you need to do is replace X and Y with the (calculated) measures for which you would like to know the correlation.

The first two variables cs and cas represent the dimension that define the array of values used to calculate the correlation. So your slicers could result in subset for a certain product, geo location, etc. Then with that selection made you would like to know the correlation over a selected period. In that case period is respresented by 'Data[Perioded]'.

By using variables, performance might be improved because the number of times that specific calculation is done will be limited.

I hope you find this calculation usefull. Suggestions and feedback are much appreciated.

No comments:

Post a Comment