Best of Seven Series

Mathematician Rob Bradley posted that the expected length of a best-of-seven series should be about 5 3/4 games, on which basis he expected the New York Yankees to win. Or maybe living close to New York has warped his sense of reality.

As a programmer, I felt a compulsion to verify his claim. I could figure out a formula, or list the possibilities (70 possible ways to win or lose 4 games out of 7), but I decided to write a simulation, using the Perl programming language. This explanation may be overly-detailed for programmers, but I want facebook friends with limited programming experience to understand.


  sub count_games_in_series {
     my ( $max ) = @_;
     $max //= 7;
     my ( $won, $lost) = (0, 0);

     for my $games ( 1..7) {
         if ( int rand 2 ) {
             $won++;
         }
         else {
             $lost++;
         }  
         say "won $won; lost $lost; total $games" if $VERBOSE>1;
         return $games if $won == 4 || $lost == 4;
     }
     return $max;
 }

In the count_games_in_series subroutine, I play a set of up to seven games, and return the number of games it took to have one or the other team win four. If the loo runs its full length, the return value is seven, but if the number of games won or the number of games lost reaches four before the full series, the routine returns early. There's a print statement that relies on a global variable that I used to debug and verify the code. The key bit of code is to generate a random number which is a fraction between zero and two, but not including two. When you discard the fractional part, you are left with an integer, either zero or one, which evaluate to false and true in the if{} test block.

I might have made the condition simpler, but I wanted to be able to explore alternatives to 50-50 odds of winning.


s ub main {
     my $rounds = $ARGV[0] || 100;
     $VERBOSE = $ARGV[1] || 0;

     my ($games, %freq, $expected);
     for my $round ( 1..$rounds ) {
         my $g = count_games_in_series();
         $freq{$g}++;
         $games += $g;
         print "$round\t$g\t$games\n" if $VERBOSE;
     } 
     printf "After $rounds rounds, total %d games, average of %6.4f.\n", 
            $games, $games/$rounds;
    
     print "games\tnumber\tpct\n";
     for my $key ( sort keys %freq ) {
         my $pct = $freq{$key}/$rounds;
         $expected += $pct * $key;
         print "$key, $freq{$key}, $pct\n";
     }
     my $int = int $expected;
     my $fraction = $expected - int $int;
     my $ratio = $fraction * 16;
     say "expected value = $expected ... $int $ratio/16";
 }

In the main routine, I process the command line arguments, the first determining the how many time I run the count_games_in_series routine, the second to control whether debugging info is printed. Default behaviour is to run 100 loops, and not print anything.

In the loop, I run the routine and each time update a variable to record the frequency of that number of games. Then I output the raw counts, and accumulated an expected value. Adding together the number of games multiplied by the percentage of times each count occurred results in the expected value. I format this is a integer and fraction over 16, since that's the way Rob had declared it.

Running the simulation for 100,000,000 series generated the results:


 $ perl best_of_seven.pl 100000000
 After 100000000 rounds, total 581262599 games, average of 5.8126.

 games	number	  pct
 4,     12494547, 0.12494547
 5,     24997766, 0.24997766
 6,     31258228, 0.31258228
 7,     31249459, 0.31249459

expected value = 5.81262599 ... 5 13.00201584/16

One time in eight, you would get a winner after four games. A quarter of the time, you have a winner after five games. But about one in three require six games, and about one in three require all seven. That's good for the fans and for the broadcasters and sports writers. They can agonize day after day about whether their favourite team will pull it off or not.

But are the odds of winning any game really 50-50? Of course there are the relative qualities of the teams, and of the pitchers, but it's hard to model that in a 53 line program. But there is the concept of home field advantage. While it doesn't affect the physical aspects of the game, the psychological effects of being at home, of the cheering fans, and the detailed knowledge of the field's characteristics and oddities must have some effect.

Of course home field advantage is a minor factor, but to test its effect I made it huge. The home team has a 90% chance of winning a game, the visiting team only 10%. In this case, you see a win after four games on 1.4% of the time, while 59% of series require the full seven games, for an average of 6.4 games. With a more moderate 75-25 breakdown, 7% of series are settled after four games while 38% go the full seven, for an expected length right on six games.

So not surprisingly, a more realistic 55-45 home field advantage has little effect on the distribution, raising the expected value from 5.813 to 5.820

If you aren't Happy
Which Dwarf Are You?

Search This Blog