Phi Coefficient

From Wikipedia:

The phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables

Given two binary variables, their interplay can be tabulated (for example, the top-left cell is some quantity when \(x\) and \(y\) are both true):

x=1x=0total
y=1\(n_{11}\)\(n_{10}\)\(n_{1\bullet}\)
y=0\(n_{01}\)\(n_{00}\)\(n_{0\bullet}\)
total\(n_{\bullet1}\)\(n_{\bullet0}\)\(n\)

Now you calculate the phi coefficient like so:

$$ \phi = \frac{n_{11}n_{00}-n_{10}n_{01}}{\sqrt{n_{1\bullet}n_{0\bullet}n_{\bullet0}n_{\bullet1}}} $$

or:

$$ \phi = \frac{nn_{11}-n_{1\bullet}n_{\bullet1}}{\sqrt{n_{1\bullet}n_{\bullet1}(n-n_{1\bullet})(n-n_{\bullet1})}} $$

The result is a number in \([-1, 1]\) that indicates the degree to which \(x\) and \(y\) are associated/correlated. \(1\) indicates that the variables are identical, and \(0\) indicates that they’re effectively independent.

An example:

  • \(x\): Event definition A
  • \(y\): Event definition B
  • \(n_{11}\): Number of events matching both event definitions
Edit