Thursday, September 30, 2010

Exploring Perl6

I've been exploring Perl6 since Rakudo-star was released in late July. There's a lot of exciting things to learn about Perl6, and you have to start early to figure things out.

I've been working on implementing a simple sudoku solver, as something to do.

My conclusion is that I'm delighted with what's been done, and what's coming ... and frustrated waiting for what's coming to turn into usable features!


Sudoku




In case you've been on Mars the last few years, sudoku is a puzzle with nine rows and nine columns. That block is also divided into 3x3 regions ... like a tic-tac-toe board subdivided into tic-tac-toe boards. When solved, each row contains the digits 1 to 9, without repetition; so does each column, and each block. Each puzzle begins with a certain number of cells filled in, your job is to figure out the rest.


You know that cell A1 can't be a 1, because there's a one already in that row, it can't be a 2 or 4 or 6 or 7 or 8. From the other cells in the column, you know it can't be a 3, either. From the other cells in the same block, you know 5 is out of the question. By a process of elimination, it must be a 9. That means none of the empty cells in that row, that column, that block can be a 9, either. That might be enough to resolve some other cells.

Sometimes that sort of elimination is not enough. You wind up with a pair of cells that might be either of a couple of values, so you need to use other information to determine what value they actually are. For example, you know there has to be a 1 somewhere in block abc/789. I can't be in column A, because of A4; it can't be in column B because of B1. B7 & C9 are occupied, so it must be in C8

My program isn't that smart, it only solves trivially easy sudoku using elimination. The goal is not really to solve sudoku, though it does that faster than I do, the goal is to explore Perl6.


sudoku.pl



My main program looks like this:
#!./perl6
use Grid;

sub test ( Str $file ) {
my $grid = Grid.new;
$grid.load( $file );
$grid.display;
$grid.show-queue;
while $grid.has-stuff-to-process and not $grid.solved {
$grid.process-next;
}
$grid.display;
$grid.show-queue
}
test( $_ ) for @*ARGS;

Starting at the top, I don't have Perl6 actually installed, it's just in a neighbouring directory, so I provide a local link to the executable, to simplify paths ... actually, because I wasn't using a hashbang line during testing, just the link.

In theory you're supposed to have a use v6 to activate Perl6 mode, but so far that doesn't seem necessary. The idea is the same binary will handle both Perl 5 and 6, and certain keywords will differentiate, but that's still in the future.

I include the Grid module, which will be describe later. For now, it's magic that does stuff.

A subroutine is still a subroutine, though there are also methods and multi-methods and other stuff. Perl is now one of the Real Languages™ ... you declare subroutine arguments between the subroutine name and the opening curly. You don't HAVE to, you have a number of options, but I usually prefer to.

Skipping over the details of the routine, for the moment, I invoke the test routine once for each command line argument. The arguments will specify files containing Sudoku game definitions, so you can run one, or several as a batch.

The Sudoku game needs a playing surface, the Grid, and you just know it's going to consist of Cells. You have options about declaring variables, you can give a little information, or a lot. In the finest Perl tradition, you can have a scalar with holds a Dog at one moment, an integer a little later, then a string ... but the language needs to keep track of what is possible. the more wide open your options, the greater the run-time load. Specific requirements helps prevent errors and makes things faster, because the system knows your variable will never handle a different type.

In this case, I leave it flexible, but immediately assign it a grid object. Like most languages, methods are invoked by joining the object name and method name by a dot. Sometimes there's an advantage to being conventional. Loading the initial game board consist of resolving the cells which are specified at the start. Resolving a cell sets its value, but also adds that cells row, column and value to a queue. As the grid is loaded, the queue is primed. Once loading is complete, I display the initial board condition, and the contents of the queue.

Note that variable names can now contain a hyphen or apostrophe so long as the next character is alphabetic, So we no longer need to be jealous of LISP programmers, and our tests can contain both can() and can't() tests.

Then it's time to search for a solution. So long as there are elements queued for processing, and the game has not been solved, process the next queued element. Applying values may resolve other cells in the row, column or block, which will lead to additional queue entries.

If the puzzle does get solved or the queue runs out, I display the final board, and show what is still left in the queue.

Update: Thank you Jonathan Worthington .. just discovered if you have a subroutine MAIN, Perl6 will automatically convert between command line arguments and subroutine args, and will even display a brief USAGE message if there's an error. Goodbye, Getopts::Long, and thanks for all the fishing.

Grid.pm


use Cell;

class Grid {
has @!cells[9][9] of Cell;
has @!queue;
has $!remaining is ro;

The playing board is a Grid of Cells. Details coming later, but basically, each cell keeps track of a set of possible values. Initially, it can be anything from 1 to 9, gradually those possibilities get whittles away until it is resolved to an exact value.

A class now gets a keyword of its own. package and class must have a bracketed scope, no longer "from here till the next package.

Attributes are declares with has and use a twigil. The first character defines the type of variable, the second is a dot for public attributes, exclamation mark for private ones. Perl6 is supposed to support multi-dimensional arrays, using [$i;$j;$k] indexing, but it isn't there, yet, so I just use an array of arrays.

When a cell is resolved, that is, when you figure out the exact value for a cell, you want to remove that value from the set of possibilities
for the other cells in that row, in that column, and in that sub-block. Resolving a single cell may resolve several neighbours, so I have a queue of resolutions waiting to be processed.

Initially, there are 81 cells waiting to be resolved. Loading the initial state resolves some of them and primes the queue. When there are no more cells waiting to the resolved, the puzzle is done.

  submethod BUILD {
for 0..8 -> $r {
for 0..8 -> $c {
@!cells[$r][$c] = Cell.new(9);
}
}
$!remaining = 81;
}

Besides subroutines, there are now methods, multi-methods (methods overloaded by signature), submethods, and some others. I don't understand what a submethod is, yet, but the constructor is one, and has the name BUILD. Each element of the Grid gets a new Cell, which may have any value in the range 1..9.

The grid has rows 1->9 and columns 1->9. Eventually Perl6 will allow you to refer to the array elements using the names you want, by putting the index in curly braces: @array{1}, but for the moment, you're forced to use traditional FORTRAN-like zero-based indices: @array[0]. I should use symbolic names or even helper routines to map from grid row and column names to array indices, but I haven't bothered.
 method load( Str $file ) {
my $init = open $file, :r
err die( "Could not open file '$file' $*ERR" );
for $init.lines -> $line {
my ( $r, $c, $v ) = $line.split(' ');
@!cells[$r-1][$c-1].assign( $v.Int );
@!queue.push( [$r, $c, $v] );
}
}

The first step in solving a puzzle is to load the initial conditions onto the board. Note that open now returns the file handle, like subroutines should. As well, the conditions associated with opening the file, in that case that we are reading the file, become adverbs, modifying the routine. Special variables are all upper-case, and frequently have a star or question mark as the second character of the twigil. There is far less punctuation or one-character variables. Unfortunately, many don't work yet.

The initial conditions file consist of rows, each row containing the row number, column number and value for a cell, separated by spaces.

I assign the value to the cell, and add it to the queue for processing related cells.
 method display {
for 0..8 -> $r {
for 0..8 -> $c {
print @!cells[$r][$c].is || '.';
}
print "\n";
}
print "\n";
}

To display the grid, print the value of each resolved cell, or a dot if it isn't resolve yet. Note the Perl6 for loop syntax, no parentheses, specifying the loop variable name in the pointy block. While Perl6 has become verbose where it has benefits, the philosophy of economy remains ... the act of declaring the loop variable is enough, no my or other superfluous keywords.
 method show-queue {
my $rstr = 'r ';
my $cstr = 'c ';
my $vstr = 'v ';
for @!queue.values -> @elem {
for @elem -> $r, $c, $v {
$rstr ~= $r;
$cstr ~= $c;
$vstr ~= $v;
}
}
say $rstr; say $cstr; say $vstr; say '';
}
For debugging, I found it useful to display the queue. Rather than display it vertically, as a stack, I rotate it so it only consumes 3 rows of display. I iterate over the queue elements, appending the row, column, and value components tot he appropriate string buffer, and then print out the buffers, with a blank line separator.

method has-stuff-to-process {
return 0 < @!queue.elems();
}
method process-next {
self.can't-be( @!queue.shift );
}
method solved {
return $!remaining == 0;
}
It's generally a bad idea for the user to access internal components. That would prevent you from altering the implementation. Instead you want to provide methods which make sense in the problem space. In this case, the queue is a problem space component, but we still want to control how the user uses it. So we prevent the user from directly manipulating the queue, just permitting him to see whether it is empty or not. Similarly, you don't want the top level to take elements from the queue and remove them from the grid, though that is what my solution did, for a long time. Better to externalize the processing of an element, but conceal the details. Similarly, the user can detect whether the puzzle has been solved, but not how that information is determined.
 method clear-and-queue ( @r, @c, @skip, $v ) {
for @r -> $r {
for @c -> $c {
next if $r == @skip[0] && $c == @skip[1];
@!cells[$r-1][$c -1].clear($v.Int);
if @!cells[$r-1][$c-1].changed {
@!queue.push( [ $r, $c, @!cells[$r-1][$c-1].is ] );
$!remaining--;
}
}
}
}
method cant-be ( @set ) {
for @set -> $r, $c, $v {
self.clear-and-queue( [1..9], [$c], [$r, $c], $v);
self.clear-and-queue( [$r], [1..9], [$r, $c], $v);
my $rt = 1 + 3 * floor(( $r - 1 ) / 3);
my $ct = 1 + 3 * floor(( $c - 1 ) / 3);
self.clear-and-queue( [$rt..$rt+2], [$ct..$ct+2], [$r, $c], $v);
}
}
}
When we fetch an element from the queue, we know that value cannot appear anywhere in the specified row, or the specified column, or the sub-block containing that row and column. clear-and-queue() is the routine which processes a range or cells. it takes an array of row numbers, and array of column numbers and a value to drop from those cells. But we want to skip over the actual cell which generated the queue element, so the actual row and column numbers are present as a pair, @skip. We iterate over the row values and the column values, and except at the generating cell location, drop the specified value from the list of possibilities. If the cell becomes resolved to a single value, indicated by setting the changed value, we add that cell to the queue, and decrement the count of cells to process.

Processing a column consists of passing the values 1..9 for the row array, and the actual column number for the column array. Processing a row is the revere; the row array has a single element, the column ranges over 1..9. Processing a block is a bit tricky, involving integer arithmetic to determine the row and column of the top left corner, and processing that row / column and the next two.


Cell.pm



class Cell {

has %!values is Keyset;
has Int $!resolves-to = 0;
has Bool $!changed = False;
Since each sudoku row, column and sub-block contain each of the digits 1..9 without repetition, any cell in a new, uninitialized sudoku grid can contain any of those values. A KeyHash uses the keys of a hash to represent a set of values; aKeySet restricts them to having boolean values. If a value becomes false, the associated key disappears from the KeySet. This is perfect for representing the possible values of a sudoku cell. Unfortunately, it doesn't work, yet. The good news is we get a chance to use list reductions.

Since KeySets don't work properly, the private %!values attribute will have the possible cell values as keys, with a hash value of 1 if the digit is still possible, and a zero if it has been eliminated

If the Cell has been resolved to a single digit, that value will be stored in the $!resolves to attribute, to eliminate querying the hash. At the time the cell is resolved, $!changed will be set; but automatically cleared when the outer program accesses the value. As a result, it will only be true once.
submethod BUILD {
%!values{$_} = True for 1..9;
}
Initialze the possible values.

method has-value( Int $n ) {
return %!values{$n};
}
Determine whether a value is among the possibilities for the cell.

method assign( Int $v ) {
%!values{$_} = False for 1..9;
%!values{$v} = True;
self.test-whether-resolved;
}

Loading an initial state for the sudoku board involves specifying a single value for a cell, so we need an assign() method. Eliminate all possible values, and then assign the single one we want. Of course the test-whether-resolved will find it resolved.
method set( Int $n ) {
%!values{$n}++;
}
The set() method is useful for testing, to add values to the set of possibilities. In sudoku solving, the list of possibilities only ever decreases.
method test-whether-resolved {
return unless ( 1 == [+] %!values.values );
$!resolves-to = first { %!values{$_} }, 1..9;
$!changed++;
}

The values for the hash will contain a one for tags that are set, zero for those that have been dropped. So %!values.values will be a list of zeros and ones. Performing a list reduction iterates over the list applying the specified operator, like a map-reduce. In this case, it returns the number of elements which are still set. If only one element is set, the cell has been solved, otherwise we exit the method.

first() is a list method which returns the first true value, like a grep which terminates after a success.

The $!changed attribute is set, to indicate the resolution is new. Since the variable is not available outside the Cell, only a changed() method, the value can only become true, once, in test-whether-resolved(), and is cleared the rest of the time.
method clear( Int $n ) {
return unless %!values{$n};
%!values{$n}--;
self.test-whether-resolved;
}
When we process the cells of the grid, any value should only be dropped once. To be sure, we only clear the value if it is set, then we check whether that change resolved the cell.
  method key-count {
return [+] %!values.values;
}
method changed {
return unless $!changed;
$!changed = False;
return True;
}
method is {
return $!resolves-to;
}
}
key-count is another testing routine, it reports the number of remaining values. The changed() method is the external interface to determine whether the cell has become solved. If the cell has been solved, the $!changed attribute is cleared, and TRUE is returned to the user. The flag is set, once, in test-whether resolved and is cleared by reading.

Disclaimer: While writing up this report, I made some changes. Some sloppiness that I tolerated while fooling around with the code became too embarrassing to stand by in a write-up. I changed the code to test it, and then edited the code in the write-up by hand. So The code does run,and gives the correct results, but there is some possibility of errors in the code I have in the file.

No comments: