BI and Analytics: Power BI

Showing posts with label Power BI. Show all posts

Sunday, September 3, 2017

Inkomens analyse (deel 3)

Dit is het laatste deel van de serie over inkomensanalyse. Klik hier voor deel 1, en hier voor deel 2. Hieronder volgen weer dashboards per onderzoeksvraagn.
Welke gemeenten hebben een hoge correlatie tussen stijging inkomen en stijging ongelijkheid.

Hoge groei en correlatie tussen groei gem. inkomen en ongelijkheid.

Saturday, September 2, 2017

Inkomens ontwikkeling (deel 2)

Dit is deel 2, van de serie over inkomensanalyse in Nederland. In deze serie beantwoorden we een aantal onderzoeksvragen. Hieronder volgt weer per onderzoeksvraag een dashboard.
Waar was de groei het grootst?

In welke gemeente is de inkomens ongelijkheid toegenomen?

Klik hier om naar deel 3 van de serie te gaan.

Sunday, August 20, 2017

Inkomens ontwikkeling bekeken per gemeente

Inleiding

Recentelijk heb ik enkele dashboards gedeeld waarmee je ontwikkeling in rentestanden kunt analyseren. Hierdoor krijg je hopelijk een beter beeld van de hypotheekmarkt.

Het bedrag dat je moet lenen hangt af van de prijs die je betaalt voor de betreffende woning. Wat is nu een realistich aankoop bedrag en hoe zal die prijs zich in de toekomst ontwikkelen? Over het algemeen wordt gesteld: 'Prediction is difficult, especially about the future', waardoor adviezen over dit onderwerp sterk uiteenlopen.

Persoonlijk leek het mij zinvol trends aan zowel de vraagkant als aanbod kant in kaart te krijgen. Funda.nl geeft je verschillende mogelijkheden om de aanbodkant te analyseren. Hoe zit het echter aan de vraagkant? We hebben gezien dat de hoogte van de rente invloed heeft op de vraagkant. Er zijn volgens mij veel indicatoren beschikbaar die een meer fundamenteel beeld van de vraagkant geven. Om die reden leek het mij zinvol om in kaart te brengen hoe de inkomens zich in Nederland hebben ontwikkeld in de periode van 2008 t/m 2014.

De data

De ruwe data heeft het CBS ter beschikking gesteld. Deze dataset bevat uit twee meetwaarden;

- Gemiddeld gestandaardiseerd (obv gezinssamenstelling) inkomen.

- Median gestandaardiseerd inkomen.

En dit per jaar, per gemeente.

Helaas is de variantie of standaard deviate niet beschikbaar. Gelukkig kan je met de median en het gemiddelde wel iets zeggen over de inkomens(on)gelijkheid. Liggen de mediaan en het gemiddelde ver uit elkaar dan kan je stellen dat de inkomensongelijkheid groot is. Als het gemiddelde veel groter is dan de mediaan, dan is er een grote kans dat er bijv. enkele rijke gezinnen het gemiddelde omhoog trekt.

Onderzoeksvragen

Met deze dataset kan je een aantal vragen beantwoorden.

- Welke gemeentes hebben momenteel gemiddeld het hoogste inkomen.
- Welke gemeentes zijn het meest hetrogeen qua inkomen?
- Welke gemeentes zijn het meest homogeen qua inkomen?

- Met hoeveel procent is het gemiddeld inkomen gestegen in de periode van 2008-2014.
- Waar was de inkomensgroei het grootst?

- In welke gemeente is de inkomens(on)gelijkheid gestegen of gedaald.
- Welke gemeenten hebben een hoge correlatie tussen stijging inkomen en stijging ongelijkheid.

Hieronder toon ik per onderzoeksvraag een dashboard.

Welke gemeentes hebben momenteel gemiddeld het hoogste inkomen.

Welke gemeentes zijn het meest hetrogeen qua inkomen in 2014?

Welke gemeentes zijn het meest homogeen qua inkomen in 2014?

Klik hier om naar deel 2 van de serie te gaan.

Plug and Play formula for correlation in DAX

As you might have seen, I wrote a blog article to calculate correlation in DAX. This was done by defining multiple measures that were used in the final correlation formula. The drawback of this approach was that it was quite time consuming to create a new correlation formula with other columns, the number of calculated measures increased and performance was not optimal.

I tried to tacle the above drawbacks and this resulted in the following formula:

Correlation = 
var cv = VALUES ( Data[Perioden] )
var cas = ALLSELECTED ( Data[Perioden] )
var x_sd = CALCULATE (STDEVX.P ( cv, [x]), cas  )
var y_sd = CALCULATE (STDEVX.P ( cv, [y]), cas )
var SD_Product = CALCULATE ( x_sd * y_sd )
var x_mean =  CALCULATE ( AVERAGEX ( cv, [x] ),  cas )
var y_mean = CALCULATE (AVERAGEX ( cv, [y] ), cas )
var vDiff_Mean_Product = 
        CALCULATE ( AVERAGEX (
                cv,
                ( [y] - y_mean ) * ([x] - x_mean )
            ),
            cas
        )
return
        CALCULATE (  vDiff_Mean_Product / SD_Product)

All you need to do is replace X and Y with the (calculated) measures for which you would like to know the correlation.

The first two variables cs and cas represent the dimension that define the array of values used to calculate the correlation. So your slicers could result in subset for a certain product, geo location, etc. Then with that selection made you would like to know the correlation over a selected period. In that case period is respresented by 'Data[Perioded]'.

By using variables, performance might be improved because the number of times that specific calculation is done will be limited.

I hope you find this calculation usefull. Suggestions and feedback are much appreciated.

Wednesday, August 9, 2017

Inkomens analyse

Sunday, May 7, 2017

Analyze mortgage rates in Netherlands (March 2017)

In this post I would like to share the dashboard I created to Analyze the Analyze mortgage rates in Netherlands (March 2017). I built the dashboard very quickly. Drop a comment to get a copy so you can improve it yourself.

I tried Qlik Sense Qloud, but it turns out you need the enterprise license to embed the dashboard. This is a quick video get an impression.

Saturday, May 6, 2017

Parcel (Kavel) analysis for Eindhoven region -> Veldhoven -> Oerle Zuid

Hi,

In this post, I would like to share a dashboard which I created to analyze parcel pricing, parcel dimensions, and square meter prices.

The steps I did:
- parse data from http://www.kavelsveldhoven.nl/oerle-zuid
- Create a Synoptic map https://synoptic.design/
- Create the dashboard.

I hope this Dashboard is of use to you:

Thursday, June 9, 2016

Market Basket Analysis (Association Rule Learning) with Power BI (DAX) and R

Introduction

In this post I will show how to run an R script from Power BI which will execute an Association rule learning script to perform market basket analysis.

In this example we will not look at products sold, but products sharing shelf space.

The dataset

Our basic dataset looks like this.

Our products:

The distribution / presence of products on the shelf of a customer:

The Power BI building blocks

The data model

As for the DAX part we will start with this post of Marco Russo and Alberto Ferrari.

So the data model in Power BI looks like this:

The R visualization

We will look at the DAX part later on. First we add an R component with a script that will return the AR rules it found.

The table contains the basic output that is to be expected from AR. We will try to build these measures in DAX later on.

The R script

As for the R script it looks like this:

   
save(dataset, file="C:/TFS/dataset.rda")

library(arules, lib.loc="C:/TFS/Rlib/a/" , logical.return = FALSE,
        warn.conflicts = F, quietly = T,verbose = F)
library(plotrix, lib.loc="C:/TFS/Rlib/p/" , logical.return = FALSE,
        warn.conflicts = F, quietly = T,verbose = F)

dataset = cbind(dataset, 1)
colnames(dataset) = c("ProductID", "CustomerID", "Waarde")
reports = xtabs(Waarde~CustomerID+ProductID, data=dataset)
reports[is.na(reports)] <- 0
rules <- apriori(as.matrix(as.data.frame.matrix(reports)),parameter = list(supp = 0.03, conf = 0.5, target = "rules"))
t = inspect(head(sort(rules, by ="support"),15))

par(mar = c(0,0,0,0))
plot(c(0, 0), c(0, 0))
if (is.null(t)) {
  t = data.frame("no rules found")
  text(x = 0.5, y = 0.5, paste("No Rules found"), 
       cex = 1.6, col = "black")
} else {
  addtable2plot(-1, -1, t, bty = "n", display.rownames = F, hlines = F,
                vlines = F)
}

Unfortunately Power BI initializes a new R sessions each time the R visualization is run / cross filtered. Therefore I tried to use a much base R as possible. As for the libraries that need to be loaded. I put these in a separate folder on my local drive and specified the folder name in the library command.

Building it in DAX

Support

The output of the arules R script can be built in DAX whenever it concerns single item combinations, so X -> Y. So not A, B -> Y. The 'support' measure is basically the '[Orders with Both Products %]' described by Russo and Ferrari. Just to show how its implemented on our dataset.

  
Customers with Both Products % = 
IF (
    NOT ( [SameProductSelection] );
    DIVIDE ( [Customers with Both Products]; [Unique Customers All] )
)

The building blocks of this formula:

Same product selection, since this is useless.

  
SameProductSelection = 
IF (
    HASONEVALUE ( Products[ID] )
        && HASONEVALUE ( 'Filter Products'[ID] );
    IF (
        VALUES ( Products[ID] )
            = VALUES ( 'Filter Products'[ID] );
        TRUE
    )
)

Customers with both products:

   
Customers with Both Products = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );
    CALCULATETABLE (
        SUMMARIZE ( Distribution; Distribution[Customer ID] );
        ALL ( Products );
        USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
    )
)

Number of customers in total:

Unique Customers All = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );   
        ALL ( Products )
    )

Confidence

   
Confidence = [Customers with Both Products] / [Unique Customers LHS]

Unique Customers LHS:

   
Unique Customers LHS = DISTINCTCOUNT(Distribution[Customer ID])

Lift


Lift = [Confidence] / [Proportion Product RHS]

Proportion product RHS:


Proportion Product RHS = Distribution[Unique Customers RHS] / [Unique Customers All]

Unique customer RHS:

 
Unique Customers RHS = 
CALCULATE (
    DISTINCTCOUNT ( Distribution[Customer ID] );
    CALCULATETABLE (
        SUMMARIZE ( Distribution; Distribution[Customer ID] );
        ALL ( Products );
        USERELATIONSHIP ( Distribution[Product ID]; 'Filter Products'[ID] )
    ); ALL(Products)
)

You can download the Power BI file here.

In this video you see the Power BI file in use:

BI and Analytics

Labels