22 July 2023 [ See Also: dos2unix ]
I encountered a strange behaviour recently with one of my scripts. It wasn't a particularly complicated script so I wasn't sure at first where it could be going wrong. The cause turned out to be that a team member had added a new line to the text file that it reads from, and there was no "newline" character at the end of that file. So it was a bug in my script, but it was found when it encountered unexpected (un-tested-for) input. If I was the only person maintaining this utility, the bug would never have manifested itself because the tools I personally use automatically ensure a trailing newline, but the bug was always there. How to mitigate against such input is considered later, but the discussion is more interesting than the conclusion.
This is a category of bug which I am somewhat surprised that I haven't really considered before. As a Unixy person it's one of those things you come across and just fix the clearly bad input and just move on, but it is something that we should really be more consciously aware of. The final "\n" (newline) character in a text file can have a big impact on how tools process it.
If you are the kind of person to be reading this website, you are probably already aware that in Linux, a text file contains various ASCII characters, and that some ASCII characters are "control characters" - they're not printable characters (like "A", "B", "C", "*", "@" or "?"), instead they tell anything processing the file to do something special. So at the end of each line of text is a "newline" character (ASCII character 10, or 0x0A in hex, also known as "LF" or Line Feed", or "\n" to some tools).
[For bonus points, we can note that DOS (and therefore Windows) has a similar convention, but actually uses two characters; a "carriage return" (CR) followed by a "line feed" (LF) - often referred to as CRLF for short. The "carriage return" says to bring the cursor back to the far left of the page. These codes, and their names, all come from to the days of teletypes (think of manual typewriters), which you won't be likely to see around in production too often these days(!) but the conventions, and the names, remain. For the record, CR is ASCII 13 (0x0D in hex), or "\n").]
Here we can "cat
" the file, and we see its contents: "hello" on one line, and "world" on the second line:
Then I used the (somewhat obscure, but useful) "od -c
" command to dump the individual characters; this shows a "\n" character at the end of each line. This is the representation of the "newline" ASCII code.
Unix and Linux have a great many tools which work on processing text files; one defacto standard is that the "newline" character goes at the end of every line of text. So if there is a line which does not end with a "\n" then it is not a line of text!
The script I had written worked something like this (massively oversimplified) example:
#!/bin/bash while read foodstuff colour do echo "I ate a $colour $foodstuff" done < food.txt
This example simply reads a type of food, and a colour, from a textfile named "food.txt
" and claims to have eaten one of each:
However, if there was no "newline" character after "purple", it would have missed that final line completely:
The fact that there was no "\n" at the end caused the script to totally fail to process the "damson purple
" line at all!
If you look closely, you will notice that "cat food.txt
" also looked different: my "steve@linux" prompt crashed into the "damson purple
" line, instead of being on its own line. This is another symptom of the missing newline character - nothing told "cat
" that a line break was needed after the word "purple", so it didn't add one.
It is a convention in all data processing to be conservative in what you emit, and liberal in what you receive. That is to say - you should always send out standard-compliant data, but also be flexible enough to cope with non-compliant input. A simple fix I found to allow for this particular situation was this script snippet, which compares "grep -c "^" ${target}
" (count how many lines have been started) with "cat ${target} | wc -l
" (how many lines the "wc -l
" tool, written specifically to count the number of lines in the "${target}
" text file). If there is a discrepancy, it simply writes a newline to the end of the file. The "echo >> "${target}"
" command simply echo
es nothing to the end of the file, which - again, the standard is to include the end-of-line marker, does actually cause the "echo
command to append an end-of-line marker to the file:
if [ $(grep -c "^" ${target}) != $(cat ${target} | wc -l) ]; then echo "Fixing invalid input: Missing final newline in ${target} - appending a newline!" echo >> "${target}" fi
This processing was added before the ${target}
file was processed. In the simple example above, the ${target}
file was simply "food.txt
" - the actual script I found the bug in was managing multiple input files, but you can assume that everything done to "${target}
" was done to "food.txt
" and also to any other input files which may have been similar to it.
The convention is the convention, now - it's been like this for many decades now (remember that comment about teletype machines at the top?!), so it is accepted as the correct behaviour. To change it now would break things that are working in production now.
However, you may notice that some tools detect this problematic situation, and will point it out to you. Like with everything UNIX/Linux, it is entirely up to you if you want it - and some of the nicer 21st Century tools will actually point out that you've got it, but that trailing newline (or lack thereof) is entirely up to you:
The Vim text editor shows "[noeol]" at the bottom of the window:
The Diff tool for showing differences between files shows you:
The Git source control tool also warns you. Note also that here I show using "echo -n damson purple >> food.txt
" - this appends "damson purple
" to the "food.txt
" file, but the "echo -n
" switch tells echo
to NOT add a "newline" character after the text. By default, it will add that "newline" character so you have to tell it specifically if you want it not to protect you from this kind of problem!
You can even easily spot it with the cat
command:
This post was somewhat inspired by Abhinav Upadhyay's Tweet: https://twitter.com/abhi9u/status/1682700157102743552
My Shell Scripting books, available in Paperback and eBook formats. This tutorial is more of a general introduction to Shell Scripting, the longer Shell Scripting: Expert Recipes for Linux, Bash and more book covers every aspect of Bash in detail.
![]() Shell Scripting Tutorial is this tutorial, in 88-page Paperback and eBook formats. Convenient to read on the go, and in paperback format good to keep by your desk as an ever-present companion. Also available in PDF form from Gumroad:Get this tutorial as a PDF | ![]() Shell Scripting: Expert Recipes for Linux, Bash and more is my 564-page book on Shell Scripting. The first half covers all of the features of the shell in every detail; the second half has real-world shell scripts, organised by topic, along with detailed discussion of each script. |