Part 2: Philosophy

Shell script programming has a bit of a bad press amongst some Unix systems administrators, and more recently with professional software developers, as the demands of Continuous Delivery and other cornerstones of the modern software industry increasingly depend upon features of shell scripting which "purer" languages cannot provide.

The shortcomings of shell programming normally fall into of one of two categories:

1. Speed

The speed at which an interpreted program will run as compared to a C program, a Java program, or even an interpreted Perl program. And Go, Rust, Ruby, Node.js are all competing for attention too.

2. Quality

Since it is easy to write a simple batch-job type shell script, there are a lot of poor quality shell scripts around. This also makes it easy to suggest that all shell scripts are bad.

So it is very useful to be able to create good shell scripts. Scripts which can be used for deployments, for data transformation, for efficient file handling or for effective manipulation of low-level machine, network, or operating system details - which a Java, Python or Go program would be unable to access as easily or as efficiently. Tying together disparate parts of a build pipeline in a couple of line of shell script is quick, effective, and gets the job done, where other tools are not up to the task.

There are a number of factors which can go into good, clean, quick, shell scripts.

1. Formatting

The most important criteria for any code must be a clear, readable layout. This is true for all languages, but again where the major languages are supported by extensive IDEs, many shell scripts are written in far less forgiving editors, so the onus is on the scripter to ensure that code is well presented. Badly formatted code is so much harder to maintain than well-formatted code.

Something about shell scripts seems to make them particularly likely to be badly indented, and since the main control structures are if/then/else and loops, indentation is critical for understanding what a script does. Again, a "proper" software developer is likely use a powerful but complicated IDE, whilst many shell scripts are written in vim.

2. Efficiency
It is easy to fall into a trap of using unnecessary commands. The shell is not a fast or efficient processor of its own language, and there is no compiler to reinterpret your intentions into the most efficient implementation, so a few minor changes to a script can often make a huge difference (good or bad) to the efficiency of the code.

A clear layout makes the difference between a shell script appearing as "black magic" and one which is easily maintained and understood.
You may be forgiven for thinking that with a simple script, this is not too significant a problem, but two things here are worth bearing in mind.

1. Feature Creep

First, a simple script will - more often than anticipated - grow into a large, complex one.

2. Maintainability

Secondly, if nobody else can understand how it works, you may be stuck with maintaining it yourself for the rest of your life!

So the humble shell script has many obstacles in its way, but it continues to be useful. And so we must learn to use it well, and within its own idioms. It won't handle multi-dimensional arrays gracefully, but it will handle I/O redirection so easily you won't even notice it. It will manage multitasking and job control in ways that more complex languages could never dream of. But its ability to deal with multi-byte integers is very poor. So we must learn the shell and embrace what it excells at, whilst always choosing the best tool for the job at hand, and there are many things which are best done with other tools.

One weakness in many shell scripts is lines such as:

cat /tmp/myfile | grep "mystring"

which would run slightly faster as:

grep "mystring" /tmp/myfile

Not much, you may consider; the OS has to load up the /bin/grep executable, which is a reasonably small 75600 bytes on my system, open a pipe in memory for the transfer, load and run the /bin/cat executable, which is an even smaller 9528 bytes on my system, attach it to the input of the pipe, and let it run.

Of course, this kind of thing is what the OS is there for, and it's normally pretty efficient at doing it. But if this command were in a loop being run many times over, the saving of not locating and loading the cat executable, setting up and releasing the pipe, can make some difference, especially in, say, a CGI environment where there are enough other factors to slow things down without the script itself being too much of a hurdle. Some Unices are more efficient than others at what they call "building up and tearing down processes" - i.e., loading them up, executing them, and clearing them away again. But however good your flavour of Unix is at doing this, it'd rather not have to do it at all.

As a result of this, you may hear mention of the Useless Use of Cat Award (UUoC), also known in some circles as The Award For The Most Gratuitous Use Of The Word Cat In A Serious Shell Script being bandied about on the comp.unix.shell newsgroup from time to time. This is purely a way of peers keeping each other in check, and making sure that things are done right.

Which leads me nicely on to something else: Don't ever feel too close to your own shell scripts; by their nature, the source cannot be closed. If you supply a customer with a shell script, s/he can inspect it quite easily. So you might as well accept that it will be inspected by anyone you pass it to; use this to your advantage with the GPL - encourage people to give you feedback and bugfixes for free!