Statistical functions such as correlation can be calculated using data within the ElastiCube Manager.
Once commonly used function is correlation between two variables. Below is a brief description on calculating Pearsons Correlation Coefficient.
1. Create a table containing the variable to correlate and the observations and measure to form the basis of the calculation. In example below we are trying to calculate the correlation between users movie preferences.
2. Create a new custom table - Add Data> Custom SQL Expression
3. Name the new custom table correlation and continue to the editor. We will be implementing the formula below into SQL.
4. In the editor use the formula below to calculate the correlation . Of course replace the table and dimension names where appropriate.
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - power(sum1, 2.0) / n) * (sum2sq - power(sum2, 2.0) / n))) AS r,
n1.user AS user1,
n2.user AS user2,
SUM(n1.rating) AS sum1,
SUM(n2.rating) AS sum2,
SUM(n1.rating * n1.rating) AS sum1sq,
SUM(n2.rating * n2.rating) AS sum2sq,
SUM(n1.rating * n2.rating) AS psum,
COUNT(*) AS n
testdata AS n1
testdata AS n2
n1.movie = n2.movie
n1.user > n2.user
n1.user, n2.user) AS step1
r DESC, n DESC
5. In the table editor click the parse sql expression icon to ensure the syntax has been configured correctly. Then press save and exit.
6. Conduct a build to process the calculation.
7. Below is example correlation matrix for the example described above.