Wednesday, February 16, 2011

Are your files SM? M? L? XL?

Twenty years ago I was on a co-op work term where a Sun Sparc 10 was shared among 6 NCD X-terminals. The system had 32 MB shared among the users, with 1 GB hard drive total storage. Today I have 12 GB memory, many terabytes hard drive ... programming experiments, video, music, and all my photography. The largest files are larger today than in the past ... I have performance profiling data from the Sudoku experiments I wrote about that are bigger than the total file system I worked with twenty years ago.

But what's more important? small files or large?

Very large files will be rare, otherwise you would run out of space. Very small files may be very important, but even a large number of them will not take up much space. Most space will be devoted to something in between ... but where is the bulk of storage devoted?

The (mental) challenge is to determine how file size is distributed, given some starting directory. My opinion is that exponential categories are appropriate, that is, 10..99 bytes, 100..999 bytes, 1000..9999 bytes, etc. Categorizing and formatting is a personal choice, determined in part by what is convenient, so long as useful information is displayed.

