Tuesday, February 22, 2011

Are your files SM? M? L? XL? Mid-Size

What I actually had in mind when I came up with the challenge was something like the following ... the sort of thing you find in SysAdmin magazine or Randall Schwartz's columns.



#!/usr/local/bin/perl5.10

use 5.010;
use strict;
use warnings;

use Readonly;

Readonly my $DOT => q{.};
Readonly my $DOTDOT => q{..};
Readonly my $ESCAPE_BACKSLASH => q{\\};

die "USAGE: $0 rootdir.\n" unless @ARGV;

my @dirs = ( @ARGV );

my %stats;
while (@dirs) {
my $one_dir = shift @dirs;
$one_dir =~ s{(\s)}{$ESCAPE_BACKSLASH$1}g; # escape spaces for glob()

ENTRY:
while ( my $entry = glob "$one_dir/*" ) {
next ENTRY if $entry eq $DOT or $entry eq $DOTDOT;
if ( -d $entry ) {
push @dirs, $entry;
}
else {
my $size = -s _;
my $len = $size == 0 ? 0 : length $size;
$stats{$len}++;
}
}
}

for my $size ( sort { $a <=> $b} keys %stats ) {
my $maxsize = 10**$size;
say sprintf( '<%8d %d', $maxsize, $stats{$size}); }


Started with some directory specified as a command-line argument, process all the directory contents: ignore '.' and '..'; add directories to the queue of directories waiting to be processed, and get the log10 size of any file, incrementing the associated count.

For all the sizes encountered, in increasing order, convert to a (unreachable) max size, and print the size and the number of files in that range.


I can do without the File::Find module, the task at hand is pretty simple. On the other hand, my tolerance for ugly punctuation has dropped in the past few years, so I need the Readonly. Without that, it becomes ...


my %stats;
while (@dirs) {
my $one_dir = shift @dirs;
$one_dir =~ s{(\s)}{\\$1}g; # escape spaces for glob()

ENTRY:
while ( my $entry = glob "$one_dir/*" ) {
next ENTRY if $entry eq q{.} or $entry eq q{..};


The dots would be more tolerable with an SQL 'in' operator, or a Perl6 Junction:



use Perl6::Junction qw/any/;
...
ENTRY:
while ( my $entry = glob "$one_dir/*" ) {
next ENTRY if $entry eq any( q{.}, q{..} );


Using a subroutine to localize the ugliness would make the double escape bearable.



sub escape_space { $_->[0] =~ s{(\s)}{\\$1}g; };

my %stats;
while (@dirs) {
my $one_dir = escape_space shift @dirs;
ENTRY:
while ( my $entry = glob "$one_dir/*" ) {
next ENTRY if $entry eq any( q{.}, q{..} );



So the final result is down to 35 lines, including blanks and closing curlies.

#!/usr/local/bin/perl5.10

use 5.010;
use strict;
use warnings;
use Perl6::Junction qw/any/;

sub escape_space { $_->[0] =~ s{(\s)}{\\$1}g; };

die "USAGE: $0 rootdir.\n" unless @ARGV;

my @dirs = ( @ARGV );

my %stats;
while (@dirs) {
my $one_dir =escape_space shift @dirs;

ENTRY:
while ( my $entry = glob "$one_dir/*" ) {
next ENTRY if $entry eq any( q{.}, q{..} );
if ( -d $entry ) {
push @dirs, $entry;
}
else {
my $size = -s _;
my $len = $size == 0 ? 0 : length $size;
$stats{$len}++;
}
}
}

for my $size ( sort { $a <=> $b} keys %stats ) {
my $maxsize = 10**$size;
say sprintf( '<%8d %d', $maxsize, $stats{$size});

No comments: