Skip to main content

Posts

Are your files SM? M? XL? the procedural XL solution

I presented this challenge at work as a beat-the-winter-blahs activity. But then I felt I couldn't present a merely operative solution. As the local Perl geek, I felt required to demonstrate that Perl could do clean, well written solutions, not just line noise. I wanted a balance between advanced Perl features and easy comprehension. On the Unix command line, I prefer piping commands together, rather than using a bunch of intermediate files. Similarly in Perl I advocate using map{} and grep{}, and feeding the output of one function into the input of another. I consider those standard idioms any programmer needs to learn. On the other hand, there were a couple of situations where exceptions and oddities in the code made me uncomfortable. Pulling things into subroutines such as calc_field_width() simplified things and made the code clearer, I think. I guess the lesson of that is that short subroutines are better. I knew that! I'm open to suggestions for alternative ways of

Are your files SM? M? L? XL? Mid-Size

What I actually had in mind when I came up with the challenge was something like the following ... the sort of thing you find in SysAdmin magazine or Randall Schwartz's columns. #!/usr/local/bin/perl5.10 use 5.010; use strict; use warnings; use Readonly; Readonly my $DOT => q{.}; Readonly my $DOTDOT => q{..}; Readonly my $ESCAPE_BACKSLASH => q{\\}; die "USAGE: $0 rootdir.\n" unless @ARGV; my @dirs = ( @ARGV ); my %stats; while (@dirs) { my $one_dir = shift @dirs; $one_dir =~ s{(\s)}{$ESCAPE_BACKSLASH$1}g; # escape spaces for glob() ENTRY: while ( my $entry = glob "$one_dir/*" ) { next ENTRY if $entry eq $DOT or $entry eq $DOTDOT; if ( -d $entry ) { push @dirs, $entry; } else { my $size = -s _; my $len = $size == 0 ? 0 : length $size; $stats{$len}++; } } } for my $size ( sort { $a <=> $b} keys %stats ) { my $maxsize = 10**$size; say sprintf( '<%8d %d&#

Are your files SM? M? L? XL? Kwick-N-EZ

When I first thought up the programming exercise I described last week in Are your files SM? M? L? XL? , my intention was to have a trivial exercise for applicants to carry out. HR was passing through lots of applicants who had detailed database knowledge, but were not at all programmers. They couldn't name simple Unix commands, couldn't talk about how to carry out a task in Perl or shell or as a pipeline of Unix commands. I thought this exercise would be simple for any experienced programmer to carry out, never mind style or performance points. Shortly after I came up with the idea, I realized it could mostly be done as a Unix pipeline. find ~/ -type f -printf "%s\n" |\ perl5.10 -n -E 'say length' |\ sort |\ uniq -c |\ perl5.10 -n -E ' |\ $fill, $count, $size) = split /\s+/; |\ $exp = 10**($size-1) |\ say "$exp $count" ' Although I hadn't used the option before, m

Are your files SM? M? L? XL?

Twenty years ago I was on a co-op work term where a Sun Sparc 10 was shared among 6 NCD X-terminals. The system had 32 MB shared among the users, with 1 GB hard drive total storage. Today I have 12 GB memory, many terabytes hard drive ... programming experiments, video, music, and all my photography. The largest files are larger today than in the past ... I have performance profiling data from the Sudoku experiments I wrote about that are bigger than the total file system I worked with twenty years ago. But what's more important? small files or large? Very large files will be rare, otherwise you would run out of space. Very small files may be very important, but even a large number of them will not take up much space. Most space will be devoted to something in between ... but where is the bulk of storage devoted? The (mental) challenge is to determine how file size is distributed, given some starting directory. My opinion is that exponential categories are appropriate, that is,

More Perl 5 - Comparing "split range" vs "test in loop"

Although the difference I found between the split range and the test within loop implementations were small, it seems to me it's wrong. So I boiled the test down to a bar minimum. I created a benchmark test consisting of nothing but the loop. I had to do something within the loop, so I stored the number in a variable. test8 => sub { for ( 0..8 ) { next if $_ == 8; my $j = $_; } }, range0 => sub { for ( 0..-1,1..8) { my $j = $_; } }, As the names and the code suggest, I actually had 9 test and 9 range instances, for each value in 0..8 . While there was a little variation within the sets, the difference between sets was significant! Using a split range was 60% faster than testing within the loop. The test achieved 376923 .. 390809 iterations per second, compared to 596918 .. 606710 for the split range, when tested for 10 seconds per version. Testing just test4 vs range4 for a minute gave

Sudoku in Perl5 #2 - Optimizing the Perl5 implementation

The test in BruteForceTest irritated me .. iterating across the whole row, testing each cell to see if it was the cell we didn't want to test against. And same for each row of the column, and each cell of the block. All those repeated tests must add up to a lot of wasted time, right? So I came up with BruteForceRange, which doesn't iterate over the whole row, only the portions to the left and right of the current cell. In the column, it iterates over the cells above and below ... the block test is more complicated, we'll get to it. The program is mostly the same as BruteForceTest. The first difference is the name changes: package BruteForceRange; Instead of specifying 0..8 for the row or column for loop ... for my $c ( 0 .. 8 ) { next CELL # don't test against self. if $c == $col; the new version specifies a portion to the left, and a portion to the right. This is possible because ranges in Perl must increment, a left boundary greater than th

Sudoku in Perl5 #1 - a brute-force solution

Having written the Perl6 Sudoku solver, I decided to go back to the Perl5 version I wrote when I was experimenting with Moose, but I couldn't find where I put it. While waiting for my senile memory to function, I implemented a brute-force solution. The basic concept of the brute-force approach is that you begin with a grid, some cells of which have been defined by the puzzle designer. I begin with the first empty cell ( call it A) and set it to '1'. I check to see if that conflicts with any already specified cells in the same row, in the same column, in the same block. If there's a conflict, I try the next element in the list of possible values : 2, 3..9. If a value does not conflict with existing cells, then on to the next cell. If all values create a conflict, then an earlier cell must have the wrong value, so back up. By recursively calling a routine over and over, backing up is easy, just return . If we run out of cells to the right, advance to the next row; when we