RSS Feed Share on Twitter

Buy this tutorial as a PDF for only $5!

All Shell Scripting Tips

16 April 2016

rm -rf /

Okay, this story has been doing the rounds over the past few days, and I think it's time to talk about it:

sudo rm -rf /* - source unknown

Man accidentally 'deletes his entire company' with one line of bad code.

The gist of the story is that he had a script which did something along the lines of:

rm -rf ${foo}/${bar}

Unfortunately, the $(something) evaluated to nothing, so what he ended up running was just plain "rm -rf /". This script was automated, and ran on all of his servers. Oh, and his backups were NFS mounted, read-write (of course), on the machines. So all of the backups got deleted, too.

Now, this story has turned out to be a hoax, and has been deleted from, but still, it raises some interesting points. One being that I did something remarkably similar myself, a few years ago.
I do take some solace in this quote from the famous Management Consultant, Peter F. Drucker:

People who don't take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year. -- Peter F Drucker

In an environment where there was no budget for any kind of Linux / Unix infrastructure (Puppet master, systems monitoring, etc etc), What we did have was backups (phew), a rapidly-growing server estate of around 100 servers at that point, and just me to manage them all. I had a script which would gather various vital statistics on all the servers - CPU, Memory, free disk space, etc. It saved all the info into a temporary directory (created by mktemp), tarred and zipped up the results, and copied them back to the server I ran the script from. There wasn't an admin server, as I mentioned, so even that script was run from one of the less-busy of the active servers. This script was nothing special, but it ran fine on the various Solaris (versions 8,9,10), Red Hat Enterprise Linux (versions 4,5), and HP-UX servers that we had. I was actually quite pleased with how OS-agnostic it was, and how adept at dealing with the differences. The example below does not do it justice, honestly!

Then we got a bunch of AIX servers. It turns out that AIX does not have the "mktemp" command. It is a great command, it creates a uniquely named temporary file (or directory, with the "-d" switch), and echoes out the name of the newly-created file (or directory). Something along the lines of "/tmp/tmp.SWYw9ZejQk". The idea is that you can run one command to create a unique temporary directory:

TEMPDIR=$(mktemp -d)

Once that one command has run, the directory has been created, and ${TEMPDIR} contains its name. So the script could save the results there, and of course, safely tidy up afterwards with "rm -rf "${TEMPDIR}", a bit like this:


function gather_data()
  # Gather data
  df -lP && df -lP > df-lP.out
  df -lh && df -lh > df-lh.out
  [ -f /proc/swaps ] && cat /proc/swaps > proc_swaps.out
  [ -f /proc/cpuinfo ] && cat /proc/cpuinfo > cpuinfo.out
  which free && free -m > free-m.out
  which prtdiag && prtdiag > prtdiag.out
  [ -f /etc/redhat-release ] && cp /etc/redhat-release .
  [ -f /etc/issue ] && cp /etc/issue .
  which swap && swap -l > swap-l.out
  # etc etc

TEMPDIR=$(mktemp -d)
cd "${TEMPDIR}"
tar cf /tmp/logs.tar "${TEMPDIR}"
gzip -9 /tmp/logs.tar
cd /
rm -rf "${TEMPDIR}"/

Now, I don't remember the exact details, but the actual result was only that the application directory got wiped, I think maybe I'd created the temp directory in there. But still, there were people actively working on configuring those applications, and they were not best pleased that I had deleted their work. I was made to feel like the worst sysadmin in the world, of course. And I felt it. I'm a contractor and as such I move around quite a bit in work, and although I left that project about a year later, they have asked me back a few times since, so I think that all is now probably just about forgiven.

I have got a few take-aways from my nasty experience, including:

  • If I, having written a book on shell scripting, can make such a silly mistake, then I'm sure that other people can do it, too. I'm not claiming that it's not a silly mistake, nor denying that it should never have happened. However, anybody who claims never to have made a mistake is either a liar, or has never, ever, tried to do anything new.
  • Backups are essential; they saved a lot of this disaster, although:
  • Even when you have nightly backups, you can still stand to lose the day's work that was done since the most recent backup was taken.
  • Expecting flawless results without paying for suitable infrastructure is an unrealistic management method. If we had had proper tools for monitoring the state of the servers, this quickly thrown-together script would not have been used. Similarly, if there had been the time to test it on the AIX servers, this bug would have been removed before it did any damage at all.

From a technical perspective, firstly, of course it is vital to know the technology of the systems that you are working on (like, does AIX include the mktemp command), but secondly, it is essential to test everything. What particularly irritates me about this mistake, was that the main part of the script was incredibly pedantic about checking that each command would work on a given OS, given that its job was to gather data about many different types of *nix system. On a Linux box, it would gather /proc/cpuinfo to count the CPUs, on a Solaris box it would run prtdiag, and so on. It was careful to run on any Bourne-like shell (the "functon fname() {...}" stuff can be problematic). It didn't assume that "tar xzf" would work. But the trivial admin stuff which surrounded the main task, the silly little bit of making a temporary directory, didn't get any real consideration at all. There was no checking that the mktemp command had succeeded; I have never come across a situation where it would fail. Unless, of course, the binary does not exist on the server. Yeah, that would fail. I know that now.

PS: I have written about this before, it is entirely avoidable. GNU's rm (i.e., that included in virtually every Linux distro) now has the "--preserve-root" option set by default. Older distros will not have this protection, so do be careful. Any such mechanisms are never fool-proof. And if you don't think that you are a fool, well... nor did I.



Books and eBooks


You can mail me with this form. If you expect a reply, please ensure that the address you specify is valid. Don't forget to include the simple addition question at the end of the form, to prove that you are a real person!

You can buy the content of this Shell Scripting Tutorial as a PDF!