Skip to main content

Are your files SM? M? L? XL? Kwick-N-EZ

When I first thought up the programming exercise I described last week in Are your files SM? M? L? XL?, my intention was to have a trivial exercise for applicants to carry out. HR was passing through lots of applicants who had detailed database knowledge, but were not at all programmers. They couldn't name simple Unix commands, couldn't talk about how to carry out a task in Perl or shell or as a pipeline of Unix commands. I thought this exercise would be simple for any experienced programmer to carry out, never mind style or performance points.

Shortly after I came up with the idea, I realized it could mostly be done as a Unix pipeline.

find ~/ -type f -printf "%s\n" |\
perl5.10 -n -E 'say length' |\
sort |\
uniq -c |\
perl5.10 -n -E ' |\
$fill, $count, $size) = split /\s+/; |\
$exp = 10**($size-1) |\
say "$exp $count" '

Although I hadn't used the option before, man find indicated that find could indeed return the size of the file and nothing else. Trying to write this article on my home machine, I discovered that is a characteristic of GNU find, not available on the Mac. So on other machines you may need to do more, maybe use ls -l or have find print out the number of blocks a file takes up ... less accurate, less complete, but sufficient for a quick proof of concept.

So find is printing a series of file sizes, one per line. My original thought was to take the logarithm of the size and truncate to an integer. But Perl will only calculate loge, so I would need to manually multiple that by loge 10. After clobbering myself over the head for ten minutes trying to achieve that, I realized that the number of digits in the size IS the upper limit of the integer portion of log10. perl -n reads the input line by line, and applies the -e expression to each line. Specifying perl5.10 (or later) and using -E instead of -e allows me to use say instead of print, saying two characters in the command name, avoiding a \n and sparing an explicit $_. I SHOULD chomp the newline off the input before getting it's length, but I can simply subtract 1. I could subtract the character now, but I found it easier to do it later.

The output of the Perl component is a series of lines, each with a number specifying how many digits appear in the file length. sort orders them, obviously, and uniq -c replaces multiple instances of a value with a single instance and the number of times that value appears.

Little Lord Flaunteroy would chomp off the newlines at the end of each line, and eliminate the leading spaces used by uniq -c. But I'm planning to split each line on space characters, to separate the count and value fields. By splitting on one-or-more spaces, the leading spaces, however many there may be, generate a single leading field with no data, which I just ignore. In real code I would parenthesize the right hand expression and use square brackets to slice off the values I want. In a one-liner, it's simpler to add a dummy variable. Use the digit-count as an exponent to obtain an unreachable upper limit ... don't forget to drop the value by one, to make up for counting the newline a few stages back. A test with an empty file, or at least one with less than ten characters in it, will remind you to make that adjustment. All that's left is to output the results.


Popular posts from this blog

BASH Matrix Multiplication

tl;dr Bash is not the language for math-intensive operations.

REPS=$1; FILE_1=$2; FILE_2=$3 OUTFILENAME=$4; readonly COLS=`head -1 $FILE_1 | wc -w`; readonly ROWS=`cat $FILE_1 | wc -l`; # echo "rows is $ROWS; cols is $COLS" if [[ $ROWS != $COLS ]]; then echo "Expecting square matrices, " \ "but rows = $ROWS, cols = $COLS\n"; exit 1; fi # -------------------------------------------------- # SUBROUTINES # function outputMatrix() { local matrixname=$1; local matrix; local elem; echo "matrix is '$matrixname'."; eval matrix=\( \${${matrixname}[@]} \); local i=0; for elem in "${matrix[@]}"; do echo -n "$elem "; if (( ++i == $COLS )); then echo ''; i=0; fi done } function multiply() { declare -a product; local M=$1 N=$2; local i j k idx1 idx2 idx3; for ((i=0; i < $ROWS; i++ )); do for ((j=0; j<$COLS; j++)); do …

Perl5, Moxie and Enumurated Data Types

Moxie - a new object system for Perl5 Stevan Little created the Moose multiverse to upgrade the Perl 5 programming language's object-oriented system more in line with the wonderfull world of Perl 6. Unfortunately, it's grown into a bloated giant, which has inspired light-weight alternatives Moos, Moo, Mo, and others. Now he's trying to create a modern, efficient OO system that can become built into the language.

I've seen a few of his presentations at YAPC (Yet Another Perl Conference, now known as TPC, The Perl Conference), among them ‎p5 mop final final v5 this is the last one i promise tar gz<. So I was delighted to recently see an announcement of the module Moxie, and decided to try implementing a card game.

While the package provides some POD documentation about the main module, Moxie, it doesn't actually explain the enum package, Moxie::Enum. But delving into the tests directory reveals its secrets.
Creating an Enum package Ranks { use Moxie::Enum; …

Creating Perl5 Objects with Moxie

Having in the previous article prepared data types for car suits and card ranks, I can now combine them to provide a playing card class, using Stevan Little's Moxie module (version 0.04, so definitely early days.) The goal is to provide an object-oriented paradigm to the Perl 5 programming language which is more sophisticated, more powerful and less verbose than manually bless()-ing hashes. To achieve that goal it needs to be faster and light-weight compared to Moose. Currently, and and are add-on modules, but eventually, when they are more complete, when the wrinkles have been ironed out, and when they have gained acceptance and a community of users, they might be merged into the Perl core.

One significant feature of Moxie is that it reduces boilerplate code. You don't have to specify warnigns or strict. As well, the features or the perl you are using are enabled, among them say, state, signatures, and post_deref.
A Simple Moxie Class package Card { …