Today my friend e-mailed me this data:
Nationals record as of 8/24/2011, by day of the week played (counting double headers)
Sun: 13-8 0.619
Mon: 6-5 0.545
Tues: 10-8 0.556
Wed: 6-15 0.286
Thu: 6-11 0.353
Fri: 14-5 0.737
Sat: 7-14 0.333
Overall: 62-66 0.484
You might notice something odd, as my friend did, namely that the Nationals play like the Yankees on Saturday and like the Orioles on Sunday. Were these unusual results statistically significant, he wondered?
To test this hypothesis I did a binomial test, assuming that the team had a true probability of winning equal to 0.484, the team’s actual winning percentage.
My test told me that neither the Sunday nor the Wednesday record was statistically significant though the Wednesday record was less likely.
To improve the test I used Bill James’ Pythagorean Theorem to calculate the team’s “expected winning percentage”. The percentage predicted by the Pythagorean Theorem, however, was not that different from the team’s actual winning percentage, and so did not substantially alter the results of the original test.
So what are we to make of this? Should we believe that this is simply random variation and there is no hidden pattern? I believe we have to go further than the statistical test to answer such a question.
The main problem with the binomial test is not that we do not know the team’s “true” winning percentage (which the Pythagorean Theorem, in theory, describes) but the assumption of a “true winning percentage”. Perhaps the Nationals played weaker opponents on Sundays? Maybe the pitching was worse on Wednesdays? Perhaps visibility is bad during night games, which are more likely played on Wednesday than Sunday. There are a variety of reasons to explain these results and I told my friend to see if he could find some hidden pattern, particularly on Wednesdays, that might explain these prima facie strange results.
But the important point is that the binomial test is not definitive. Statistical tests, when the assumptions are valid, are good for working with large amounts of data where human intuition and reasoning skills are notoriously bad. These tests can lead us toward new information or to test hypothesis that we simply cannot verify by ourselves (Voros McCracken’s efforts with BABIP immediately leap to mind).
When the assumptions are not valid, statistical tests tell us nothing. This is not to necessarily say that in the case above the assumption that a team has an inherent winning percentage is invalid; if a team used the same lineup every day, for instance, I’d be more willing to accept such an assumption. But we should always remember that the assumptions of a statistical test are absolutely crucial. Assuring the validity of assumptions keep statistical tests close to reality and leads to credible and useful results.
When we view statistical tests as the final word in knowledge (which can easily happen, often because of the rhetoric of those doing the tests), we are much more likely to accept dubious assumptions and to not investigate unusual results. This can lead to unrealistic conclusions (again Voros McCracken leaps to mind). It is also the wellspring of villainy and bogus work in baseball analysis.
The only way to obtain clear baseball knowledge with statistics is to stay very close to reality. Ask yourself whether the assumptions are valid and whether the results make sense. Make sure counterintuitive results do not come from invalid assumptions. And never overstate the results of statistical tests.
With my friend’s case my opinion is that, despite the test results, the Wednesday data should be further investigated. That is, a statistical test should never be used to hide knowledge.