Skip to main content

Best of Seven Series

Mathematician Rob Bradley posted that the expected length of a best-of-seven series should be about 5 3/4 games, on which basis he expected the New York Yankees to win. Or maybe living close to New York has warped his sense of reality.

As a programmer, I felt a compulsion to verify his claim. I could figure out a formula, or list the possibilities (70 possible ways to win or lose 4 games out of 7), but I decided to write a simulation, using the Perl programming language. This explanation may be overly-detailed for programmers, but I want facebook friends with limited programming experience to understand.
sub count_games_in_series { my ( $max ) = @_; $max //= 7; my ( $won, $lost) = (0, 0); for my $games ( 1..7) { if ( int rand 2 ) { $won++; } else { $lost++; } say "won $won; lost $lost; total $games" if $VERBOSE>1; return $games if $won == 4 || $lost == 4; } return $max; }
In the count_games_in_series subroutine, I play a set of up to seven games, and return the number of games it took to have one or the other team win four. If the loo runs its full length, the return value is seven, but if the number of games won or the number of games lost reaches four before the full series, the routine returns early. There's a print statement that relies on a global variable that I used to debug and verify the code. The key bit of code is to generate a random number which is a fraction between zero and two, but not including two. When you discard the fractional part, you are left with an integer, either zero or one, which evaluate to false and true in the if{} test block.

I might have made the condition simpler, but I wanted to be able to explore alternatives to 50-50 odds of winning.
s ub main { my $rounds = $ARGV[0] || 100; $VERBOSE = $ARGV[1] || 0; my ($games, %freq, $expected); for my $round ( 1..$rounds ) { my $g = count_games_in_series(); $freq{$g}++; $games += $g; print "$round\t$g\t$games\n" if $VERBOSE; } printf "After $rounds rounds, total %d games, average of %6.4f.\n", $games, $games/$rounds; print "games\tnumber\tpct\n"; for my $key ( sort keys %freq ) { my $pct = $freq{$key}/$rounds; $expected += $pct * $key; print "$key, $freq{$key}, $pct\n"; } my $int = int $expected; my $fraction = $expected - int $int; my $ratio = $fraction * 16; say "expected value = $expected ... $int $ratio/16"; }
In the main routine, I process the command line arguments, the first determining the how many time I run the count_games_in_series routine, the second to control whether debugging info is printed. Default behaviour is to run 100 loops, and not print anything.

In the loop, I run the routine and each time update a variable to record the frequency of that number of games. Then I output the raw counts, and accumulated an expected value. Adding together the number of games multiplied by the percentage of times each count occurred results in the expected value. I format this is a integer and fraction over 16, since that's the way Rob had declared it.

Running the simulation for 100,000,000 series generated the results:
$ perl best_of_seven.pl 100000000 After 100000000 rounds, total 581262599 games, average of 5.8126. games number pct 4, 12494547, 0.12494547 5, 24997766, 0.24997766 6, 31258228, 0.31258228 7, 31249459, 0.31249459 expected value = 5.81262599 ... 5 13.00201584/16
One time in eight, you would get a winner after four games. A quarter of the time, you have a winner after five games. But about one in three require six games, and about one in three require all seven. That's good for the fans and for the broadcasters and sports writers. They can agonize day after day about whether their favourite team will pull it off or not.

But are the odds of winning any game really 50-50? Of course there are the relative qualities of the teams, and of the pitchers, but it's hard to model that in a 53 line program. But there is the concept of home field advantage. While it doesn't affect the physical aspects of the game, the psychological effects of being at home, of the cheering fans, and the detailed knowledge of the field's characteristics and oddities must have some effect.

Of course home field advantage is a minor factor, but to test its effect I made it huge. The home team has a 90% chance of winning a game, the visiting team only 10%. In this case, you see a win after four games on 1.4% of the time, while 59% of series require the full seven games, for an average of 6.4 games. With a more moderate 75-25 breakdown, 7% of series are settled after four games while 38% go the full seven, for an expected length right on six games.

So not surprisingly, a more realistic 55-45 home field advantage has little effect on the distribution, raising the expected value from 5.813 to 5.820

Comments

Popular posts from this blog

Perl5, Moxie and Enumurated Data Types

Moxie - a new object system for Perl5 Stevan Little created the Moose multiverse to upgrade the Perl 5 programming language's object-oriented system more in line with the wonderfull world of Perl 6. Unfortunately, it's grown into a bloated giant, which has inspired light-weight alternatives Moos, Moo, Mo, and others. Now he's trying to create a modern, efficient OO system that can become built into the language. I've seen a few of his presentations at YAPC (Yet Another Perl Conference, now known as TPC, The Perl Conference), among them ‎p5 mop final final v5 this is the last one i promise tar gz While the package provides some POD documentation about the main module, Moxie, it doesn't actually explain the enum package, Moxie::Enum. But delving into the tests directory reveals its secrets. Creating an Enum package Ranks { use Moxie::Enum; enum by_ARRAY => qw( unused 2 3 4 5 6 7 8 9 10 J Q K A ); enum by_HASH => { 2 => 2, 3 =...

Creating Perl5 Objects with Moxie

Having in the previous article prepared data types for car suits and card ranks, I can now combine them to provide a playing card class, using Stevan Little's Moxie module (version 0.04, so definitely early days.) The goal is to provide an object-oriented paradigm to the Perl 5 programming language which is more sophisticated, more powerful and less verbose than manually bless() -ing hashes. To achieve that goal it needs to be faster and light-weight compared to Moose. Currently, Moxie.pm and and MOP.pm are add-on modules, but eventually, when they are more complete, when the wrinkles have been ironed out, and when they have gained acceptance and a community of users, they might be merged into the Perl core. One significant feature of Moxie is that it reduces boilerplate code. You don't have to specify warnigns or strict . As well, the features or the perl you are using are enabled, among them say , state , signatures , and post_deref . A Simple Moxie Class packag...

Book review: 390+ Python Interview Questions and Answers

I downloaded a preview portion of 390+ Python MCQs from Anazon, thinking reading through it would help me advance my Python skills beyond what I have learned from Harvard’s online CS50P (Python) course. I’m an experienced program looking to add a new skill to my repertoire, and while the course covered many significant aspects of Python programming, there are many other details to perfect, such as best practices, developing packages, and so on. The book is written by Manish Dnyandeo Salunke, who claims 15 years experience in IT,  but it is not clear who published it. It is obvious no one edited it, or verified the correctness of the questions, answers and explanations. Amazon allowed me to download a sample of (I think) 57 questions. Roughly half of these were wrong, and some of the others struck me as irrelevant. The maximum allowed length for an identifier, apparently, is 79 characters. Anything over 20 characters should be considered unusual, so sufficient to say the limit is se...