# Phi Coefficient

From Wikipedia:

The phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables

Given two binary variables, their interplay can be tabulated (for example, the top-left cell is some quantity when $$x$$ and $$y$$ are both true):

x=1x=0total
y=1$$n_{11}$$$$n_{10}$$$$n_{1\bullet}$$
y=0$$n_{01}$$$$n_{00}$$$$n_{0\bullet}$$
total$$n_{\bullet1}$$$$n_{\bullet0}$$$$n$$

Now you calculate the phi coefficient like so:

$$\phi = \frac{n_{11}n_{00}-n_{10}n_{01}}{\sqrt{n_{1\bullet}n_{0\bullet}n_{\bullet0}n_{\bullet1}}}$$

or:

$$\phi = \frac{nn_{11}-n_{1\bullet}n_{\bullet1}}{\sqrt{n_{1\bullet}n_{\bullet1}(n-n_{1\bullet})(n-n_{\bullet1})}}$$

The result is a number in $$[-1, 1]$$ that indicates the degree to which $$x$$ and $$y$$ are associated/correlated. $$1$$ indicates that the variables are identical, and $$0$$ indicates that they’re effectively independent.

An example:

• $$x$$: Event definition A
• $$y$$: Event definition B
• $$n_{11}$$: Number of events matching both event definitions
Edit