RSS Feed Share on Twitter

All Shell Scripting Tips

5 Nov 2018

dos2unix / unix2dos

Converting between CRLF (DOS) and LF (*nix) text file formats. With a footnote about '/bin/bash^M: bad interpreter' for good measure.

Text files are supposed to be pretty simple things. They are very often just streams of plain 8-bit ASCII bytes. (Unicode does things differently, but that's beyond the scope of this short article). For example, consider a text file which looks like this:

linux$ cat a.txt
Hello World.
This is a file.
linux$

What you can't immediately see is that there is a special byte, a 'control character', which says "now there is a linebreak." This is represented, in Unix and Linux, with the ASCII "10" (00001010) character, often displayed as "\n" or "LF" for "Line Feed," or "^L" (because L is the 10th letter of the alphabet)

You can see this with the (slightly obscure) od command, which shows all the contents of a file, whether regular printable characters or non-printable control characters.

linux$ od -c a.txt
0000000   H   e   l   l   o       W   o   r   l   d   .  \n   T   h   i
0000020   s       i   s       a       f   i   l   e   .  \n
0000035
linux$

Think back to mechanical typewriters; when you hit the big bar on the side, it goes down to start the next line of text. That is what LF does. One single item which moves one line down and back to the start of the line.

However, some systems interpret the "LF" character as literally being a vertical move down; it does not reset the cursor to the left-hand side of the page (or screen). They need a "CR" or "Carriage Return" character, as well as the "LF". CR is ASCII 13 (00001101), also known as "\r," or "^M" (because M is the 13th letter of the alphabet).

dos2unix

With a file formatted in this style, the od utility shows \r\n for each linebreak:

linux$ od -c a.txt
0000000   H   e   l   l   o       W   o   r   l   d   .  \r  \n   T   h
0000020   i   s       i   s       a       f   i   l   e   .  \r  \n
0000037
linux$ 
The Microsoft family of operating systems are the best-known which use this convention. And whilst Windows is very popular on the Desktop, Linux is very popular in the Cloud.

Clouds

dos2unix / unix2dos

So a common problem is finding a file in one format, when you need it to be in the other format. Many utilities can cope with both formats (Notepad++, VIM, cat spring to mind) but many others cannot. Also, when a file has changed its line endings, Git will see it as a change to every single line in the file, even if none of the content has changed.

If you have the dos2unix utility installed, you can easily convert between the two formats. Simply use "dos2unix a.txt" or "unix2dos a.txt" to convert a file. Notice that the 'file' utility says "CRLF" to indicate the "CR" and "LF" ending of a DOS-format text file. Otherwise, it calls it "ASCII text"

# Start with a plain ASCII file:
linux$ file a.txt 
a.txt: ASCII text
# Convert it to DOS format:
linux$ unix2dos a.txt 
unix2dos: converting file a.txt to DOS format...
# Check its new format:
linux$ file a.txt 
a.txt: ASCII text, with CRLF line terminators
# Convert it back to Unix/Linux format:
linux$ dos2unix a.txt 
dos2unix: converting file a.txt to Unix format...
# Check it again:
linux$ file a.txt 
a.txt: ASCII text
linux$ 

Other Ways To Convert Files

You may not always have "dos2unix a.txt" or "unix2dos a.txt" available; it may be possible to "sudo apt-get install dos2unix" or "sudo yum install dos2unix", but if you don't have repositories configured, or you have no internet access, that may not work. Don't fear! There are alternatives. Some of them may be better than others - particularly with Unicode files, you may find that an ASCII 10 or ASCII 13 character may appear within a filestream as something other than "CR" or "LF", but these alternative solutions should get you out of trouble:

1) perl

The Perl utility has a few ways to convert files. Firstly, using regular expressions.

You can convert a Unix-style file to DOS format as follows (if your file is named a.txt):

# Convert *nix to DOS:
linux$ perl -pi -e 's/$/\r/' a.txt

You can convert it back again like this:

# Convert DOS to *nix:
linux$ perl -pi -e 's/\r\n$/\n/g' a.txt

You can also use the MExtUtils module, if you have it installed:

# Convert *nix to DOS:
linux$ perl -MExtUtils::Command -e unix2dos a.txt
# Convert DOS to *nix:
linux$ perl -MExtUtils::Command -e dos2unix a.txt

2) sed

The Stream Editor, sed, can also be used, in a very similar way. Note that the -i syntax works with GNU sed, as is available on most Linux systems. Some Unix systems may not support this, in which case you will need to write the result to a temporary file.

# Convert *nix to DOS:
linux$ sed -i 's/$/\r/' a.txt
# Convert DOS to *nix:
linux$ sed -i 's/\r$//' a.txt

3) tr

At a push, you could also use the tr utility. It can't edit files in-place, and it can only convert in one direction, from DOS to Unix. This will read the DOS-formatted a.txt and create a Unix-formatted b.txt:

# Convert DOS to *nix:
linux$ tr -d '\r' < a.txt > b.txt

Summary

It is a bit of a pain that there are two formats for a plain text file. In reality however, we now need to deal with more complex files, like Unicode multi-byte characters, so the "old days" of plain 8-bit (or even 7-bit) ASCII text is not something we can design for. However, there will be plain ASCII files (of both formats) for many decades more, so we need to understand what they are, and how to convert from one to the other.

If you have dos2unix, that is by far the easiest way to convert them. But the Perl and Sed examples above show a few other ways that are available.

It is always good to keep text files consistent, whether in DOS or Unix format. Having a mix of both can cause all manner of problems.

Footnote: /bin/bash^M: bad interpreter: No such file or directory

If you have a shell script like this, it can be difficult to work out why it isn't working:

linux$ cat a.sh
#!/bin/bash
echo "Hello, World!"
linux$ ./a.sh
bash: ./a.sh: /bin/bash^M: bad interpreter: No such file or directory
linux$ 

If your script is in DOS format, od will show it like this:

linux$ od -c a.sh
0000000   #   !   /   b   i   n   /   b   a   s   h  \r  \n   e   c   h
0000020   o       "   H   e   l   l   o   ,       W   o   r   l   d   !
0000040   "  \r  \n
0000043
linux$ 

Notice that the first line is "#!/bin/bash\r\n". The final \n will be treated as a linebreak, but the kernel will try to execute your chosen interpreter, which you have said is named "/bin/bash\r" - as mentioned above, some tools display "\r" as "^M" - there are multiple ways of displaying these non-printable characters.

So the kernel is complaining that it cannot find an interpreter called "/bin/bash^M". There is no file with that name. There is probably a /bin/bash, but that's not what it's been asked to find. It's been asked to find "/bin/bash^M". And yes, computers really are that dumb. It's a good thing that they're so precise, but a pain that they're so pedantic. You can't have one without the other.

This has confused many people many times over many years, but the fix is simple: Convert the file from DOS format to Unix format:

linux$ ./a.sh
bash: ./a.sh: /bin/bash^M: bad interpreter: No such file or directory
linux$ dos2unix a.sh
dos2unix: converting file a.sh to Unix format...
linux$ ./a.sh
Hello, World!
linux$ 

You're welcome! Feel free to buy me a cup of coffee, if this saved your skin!

 

 


You can buy the content of this tutorial as a PDF to download to all of your devices!

Contact

You can mail me with this form. If you expect a reply, please ensure that the address you specify is valid. Don't forget to include the simple addition question at the end of the form, to prove that you are a real person!