Bash can only handle integer arithmetic. To process floats, you need to shell out to 'bc' or some other external utility. The core multiplication changes to: result=`echo "$result + $m * $n" | bc`; Boy, does that slow things down! For each multiplication, you're launching a shell, invoking a program, returning results. The result is 800 multiplications a second. Note this is running on integer matrices, there remain unresolved problems processing floats. One obvious problem is shelling out for every multiplication. More efficient is to accumulate the expression to calculate, and just invoke the external calculator once. Now we only shell out N^2 times, instead of N^3, so increasing matrix size leads to better performance, up to a limit. the 32x32 matrix gives 18x performance compared to the simpler version. This implies the 32x reduction in shelling out is offset by the cost of building the expression. By the time we reach 100x100, overhead costs are...