Sunday, 13 June 2021

The Chi-Square Statistic: \( \chi^{^2}_{_{_c}} \)

While perusing YouTube, the idea of a watching a video about the Chi-Square test popped into my head today and I decided to watch the video with the most views: 1,867,764 views uploaded on November 14th 2011, almost ten years ago.

I thought about how the test might be carried out in SageMath. My initial investigation didn't find anything conclusive so I turned to the trusty spreadsheet, specifically Google Sheets. I've tended to neglect spreadsheets since making use of SageMath and so this was an opportunity to revisit old territory.

What the chi-square statistic looks like is shown in Figure 1:

Figure 1

The \(c\) represents the numbers of degrees of freedom. O represents the observed frequencies and E represents the expected frequencies. For the 36 tosses of a fair die, the expected frequencies are all 6. Figure 2 shows what I came up with in Google Sheets.


Figure 2: link

The chi-square function takes the observed and expected frequencies and returns the probability that the results are due to chance alone. In the example shown in Figure 2, the probability is 0.0853 or a little over 8%. This falls short of the less than 5% that is usually regarded as the minimum requirement. 

In the worksheet, I've added superfluous information for the purpose of showing how things wre done in "the old days". I've calculated the differences between O and E, squared these and then divided by E as per the formula. The total is 9.67 and, looking at the black table in Figure 2, the cut-off point is 11.070 that appears in the 0.05 column with 5 degrees of freedom. 

Figure 3 shows the expected versus the observed results:


Figure 3

It's easy enough to get SageMath cell to carry out the necessary steps to arrive at the 9.67 result. Figure 4 shows a screenshot of the algorithm along with the permalink.


Figure 4: permalink

What I was looking for in SageMath was a function that would take the two lists as input and output the probability in the same way as the spreadsheet did. Perhaps it's possible. I'll keep investigating.

UPDATE on June 21st 2021

I just watched a video on YouTube demonstrating the application of the Chi-Square test using Excel. A manual method is used as well as making use of the built in Chi-Square function. The video does a good job of explaining the statistic using a 3 x 3 table as an example.

No comments:

Post a Comment