Thursday, April 18, 2013

Learning Perl5i #3

I've long admired the way functional languages specify multiple implementations of a routine for different sets of argument values, to detect terminal conditions. Converting a Perl6 solution  of a combinatorics task to Perl5i, I realized the plain Perl solution isn't necessarily all that bad, either.

The problem is to list the different ways you can select M elements from a set of N. Select 2 elements from a set of 4 would lead to the values ( 1, 2 ); ( 1, 3 ); ( 1, 4 ); ( 2, 3 ); ( 2, 4 ); ( 3, 4 ). Oh yeah, they want the characters in each combination in sorted order, and  the list of combinations sorted, too.

So it all consists of a single routine which gets invoked as combine(3, ['a'..'e']).

Ignore the first couple of lines, for a moment. If the first character of the set exists in a combination, it must be in the first position, since the characters are in sorted order. That means the remainder of the combination must be one character shorter, made of the remaining characters. So that part calls for a recursive solution:

* pull off the first character from the alphabet;
* recursively determine combinations of length -1 from the remainder of the set;
* reassemble be prefixing the first character onto each solution.

The other possibility is that the first character is not in the solution. That calls for recursively invoking the routine with the original length, but a smaller alphabet. If we do that often enough, we reach a situation where the alphabet is empty. This situation is handled by the first line we skipped over. In fact, we could optimize and abort if the size of the alphabet is smaller than the length of string we are requesting.

The more normal end case is when we are asking for a single character, the last in the sequence. In this situation, each character in the set is a candidate, so we return each character as an element in an anonymous array. Returning to the calling level, we then prefix the previous head of the alphabet onto the sequence using unshift. When we reach the top level, we have a list of arrays, each array being one combination.

While I was satisfied with the solution, I was unhappy with hard-coding the invocation. If I'm trying to demonstrate good coding, I should allow the program to be invoked with command-line arguments, to vary the way it is used.

So I added my favourite command-line options module, Getopt::Long, and modified the main section to fetch and use options:

my $options = parse_options();
say @$_ for combine( $options->{length}, 
                     @{$options->{alphabet}} );

The routine to read options looks like this:

func parse_options() {
    my %options = ( length => 3,
   set => 5,
    GetOptions( \%options, 'length=s', 'set=s' )
or die( 'some usage message';

    $options{alphabet} = 
        [ ('a'..'z')[0..$options{set}-1] ];
    return \%options;

Getopt::Long has a  rich set of options to support single character or long argument names, to allow flags, options with string or numeric values, hashes and even multiple values, as well as supporting multiple names and abbreviations. You can specify a parameter, and then specify a reference to a scalar in which to store the value, but I prefer to pass a reference to a hash as the first argument, which gets all the values.

By setting some values in the hash, you can specify default values.

In this case I allow two arguments, the length of the strings, M in combinatorics talk, and the size of the alphabet, N in combinatorics talk. I then create an alphabet of set N by taking an appropriately-sized subset of the sequence a to z.

But if GetOptions fails because the argument list is totally unusable, you need to report an error. And it should probably provide a usage message, describing how to invoke the program. My favourite solution is Pod::Usage, which makes it easy to display the POD contained in the file ... but that means documenting the program with POD.

So the question is, is this getting silly? Or is it an appropriate example to present to people wanting to learn modern Perl?

use perl5i::2;
use Getopt::Long;
use Pod::Usage;

# ----------------------------------------
# Generate combinations of length $len consisting
of characters from the sorted set @chars, using 
each character once in a combination,with sorted
# strings in sorted order.
# We will use a recursive solution:
# * If we run out of eligable characters, we've
#   gone too far, and won't find a solution along
#   this path.
# * If we are looking for a single character, each 
#   character in @set is eligible, so return each
#   as the single element of an array.
# * We have not yet reached the last character, so
#   there are two possibilities:
#   1) push the first element onto the front of an 
#      N-1 length combination from the remainder of
#      the set.
#   2) skip the current element, and generate an  
#      N-length combination from the remainder
func combine($len, @chars) {
  return unless @chars;
  return map { [ $_ ] } @chars if $len == 1;

  my ($head) = shift @chars;
  my @result = combine( $len-1, @chars );
  for my $subarray ( @result ) {
    $subarray->unshift( $head );
  return ( @result, combine( $len, @chars ) );

func parse_options() {
    my %options = ( length => 3,
   set => 5,
    GetOptions( \%options, 'length=s', 'set=s',
                'man', 'help' )
or pod2usage(2);
    pod2usage(1)  if $options{help};
    pod2usage(-verbose => 2)  if $options{man};

    $options{alphabet} =
        [ ('a'..'z')[0..$options{set}-1] ];
    return \%options;

my $options = parse_options();
say @$_
    for combine( $options->{length},
                 @{$options->{alphabet}} );


=head1 NAME

combinations.pli - determine possible combinations of M elements from a set of N

=head1 VERSION

This documentation applies to combinations.pli version 0.0.1


    combinations.pli [--length M] [--set N] \
                     [-man] [-help]

=head1 OPTIONS

=over 4

=item length M

Create strings of this length

=item set N

Use an alphabet of this many characters

=item help

Display a brief summary of information about the program.

=item man

Display the full documentation.



Generate combinations of length C<length> consisting of characters from the sorted set C<alphabet>, using each character once in a combination, with sorted strings in sorted order.


No comments: