Here’s my happy little accident: I was testing an algorithm versus a baseline of a strict majority algorithm (that is, if there were 60% “yes” in the training data and 40% “no”, then the baseline would always predict “yes” for every piece of test data). This was years ago, when we evaluated using leave-one-out-cross-validation. My algorithm did surprisingly well — 70% better than the baseline. Wait a minute … that’s more than surprising; that shouldn’t be possible. I soon discovered that my training data was exactly balanced between yes and no examples, so when you leave one out, the baseline majority algorithm scores exactly 0%.