Although the difference I found between the split range and the test within loop implementations were small, it seems to me it's wrong.
So I boiled the test down to a bar minimum. I created a benchmark test consisting of nothing but the loop. I had to do something within the loop, so I stored the number in a variable.
As the names and the code suggest, I actually had 9 test and 9 range instances, for each value in 0..8. While there was a little variation within the sets, the difference between sets was significant! Using a split range was 60% faster than testing within the loop. The test achieved 376923 .. 390809 iterations per second, compared to 596918 .. 606710 for the split range, when tested for 10 seconds per version. Testing just test4 vs range4 for a minute gave similar results:
How about if the loops are larger? If we go from 0 to 80 or 0 to 800 instead of just 0 to 8, the test should get worse, since it's repeating the test over and over, while the split range has no extra work to do.
And in fact the relative speed improves linearly as the loops size grows by powers of ten from an upper limit of 8, to 8,000, after which the improvement decreases ... guess that's log(N). But by 80,000 it's only doing 50 or 100 reps a second, accuracy may be fading.
So the question is, in isolated testing, the range is definitely better than the test within the loop. Why does the test perform better in the otherwise identical program? After all, all that was happening within the loop was hash and array de-referencing, and an integer comparison.
Using-d:FProf , I see that BruteForceTest.pl took 0.030 seconds to run a total of 5450 row, column and block tests, while BruteForceExplicitList.pl took 0.026 seconds for 5450 calls to the unified test . using the timings from the first test, above, the explicit list loop should have been 9 ms while the repeated test should be 14 ms. Presumable the 16ms is the hash and array de-referencing and the comparison.
But the test wrapper,cell_value_ok used to be a trivial call to the row, column and block tests, but now it constructs a few arrays. That must be why it has gone from 0.006 ms to 0.015 ms.
Of course the real lesson is that profiling programs that take 1/10 second overall is a waste of time
So I boiled the test down to a bar minimum. I created a benchmark test consisting of nothing but the loop. I had to do something within the loop, so I stored the number in a variable.
test8 => sub { for ( 0..8 ) {
next if $_ == 8;
my $j = $_;
} },
range0 => sub { for ( 0..-1,1..8) {
my $j = $_;
} },
As the names and the code suggest, I actually had 9 test and 9 range instances, for each value in 0..8. While there was a little variation within the sets, the difference between sets was significant! Using a split range was 60% faster than testing within the loop. The test achieved 376923 .. 390809 iterations per second, compared to 596918 .. 606710 for the split range, when tested for 10 seconds per version. Testing just test4 vs range4 for a minute gave similar results:
Rate test4 range4
test4 386996/s -- -35%
range4 593520/s 53% --
How about if the loops are larger? If we go from 0 to 80 or 0 to 800 instead of just 0 to 8, the test should get worse, since it's repeating the test over and over, while the split range has no extra work to do.
0-8,000
Rate test4 range4
test4 540/s -- -39%
range4 886/s 64% --
0-80,000
Rate test4 range4
test4 55.4/s -- -38%
range4 88.7/s 60% --
And in fact the relative speed improves linearly as the loops size grows by powers of ten from an upper limit of 8, to 8,000, after which the improvement decreases ... guess that's log(N). But by 80,000 it's only doing 50 or 100 reps a second, accuracy may be fading.
So the question is, in isolated testing, the range is definitely better than the test within the loop. Why does the test perform better in the otherwise identical program? After all, all that was happening within the loop was hash and array de-referencing, and an integer comparison.
return # collision
if $val == $self->{grid}[$row][$c];
Using
But the test wrapper,
Of course the real lesson is that profiling programs that take 1/10 second overall is a waste of time
Comments