Each element of the result matrix is the product of one row from the first matrix, and one column from the second. Going from one element to the next in the same row is much quicker than going to another row. The first matrix only changes rows once for each row, N changes, but the second matrix has N^3 changes of row, one for each multiplication.

If we were to transpose the second array, turn the rows into columns and columns into rows, the multiplication could be carried out by by scanning along a row instead of going from row to row.That reduces the cost to N^2. How much will that save us?

I transposed the second array while reading it in:

```
for my $size ( keys %vars ) {
my $filestub = q{M_} . $size . q{x} . $size . q{.};
my ( $f1, $f2 ) = ( $filestub . '1', $filestub . '2' );
$vars{$size}[0] = readMatrix( $f1 );
$vars{$size}[1] = transpose( readMatrix( $f2 ));
}
```

```
```

and use pairwise() from List::MoreUtils to multiply corresponding elements of the rows, and sum0() from List::Utils to add up the products.```
for my $i ( 0..$rows - 1 ) {
for my $j ( 0 .. $cols - 1 ) {
prod[$i][$j]
= sum0 pairwise { $a * $b } M->[$i], N->[$j];
}
}
```

The results were much the same for both integers and floats.

```
-➤ perl ./int_benchmark.pl
Processing for 5 seconds each size will take 25 seconds.
Wed Jun 17 15:25:14 2015
Rate M_100x100 M_32x32 M_10x10 M_5x5 M_2x2
M_100x100 5.54/s -- -97% -100% -100% -100%
M_32x32 174/s 3046% -- -97% -100% -100%
M_10x10 5114/s 92135% 2832% -- -85% -98%
M_5x5 34909/s 629461% 19913% 583% -- -87%
M_2x2 274124/s 4943582% 157052% 5260% 685% --
Wed Jun 17 15:25:46 2015
-➤ perl -E '
say "100 => ", 100**3 * 5.54 / 10**6;
say " 32 => ", 32**3 * 174 / 10**6;
say " 10 => ", 10**3 * 5114 / 10**6;
say " 5 => ", 5**3 * 34909 / 10**6;
say " 2 => ", 2**3 * 274124 / 10**6;
'
100 => 5.54
32 => 5.701632
10 => 5.114
5 => 4.363625
2 => 2.192992
```

Previous float results were 2.1, 4.1, 4.9, 5.2 and 5.4 MFLOPS for 2x2 ... 100x100 sized matrices. So 10,000 row changes or 1,000,000, it doesn't make much difference.

I modified the Javascript program to operate on a transposed second matrix, but since Javascript doesn't provide a pairwise() routine, I generated a dot-product() routine which takes two rows. But both integer and float matrices produce much the same results as before ... +/- 10%. So transposing is irrelevant when there's so much other overhead in a scripting language.

## 1 comment:

If you need performance with linear algebra / large data in Perl, why not use PDL?

Post a Comment