Buy this tutorial as a PDF for only $5
6 May 2020
The relatively obscure 'join' command
There are some older *nix commands which have tricky syntax, and can be intimidating to learn
Unix, and therefore the Shell (whether under Unix, Linux, or any other operating system), is very much text-file driven. The Unix philsophy of "everything is a file", combined with the flexibility and easily-edited nature of text files means that a lot of information in a Unix / Linux system is stored in plain-text files. This makes for easy management of data, without bespoke tools, and it also means that the Unix and Linux userspace (the command-line tools available to the shell) includes a lot of tools for manipulating text files.
One such command is called join
, and as its syntax is a bit opaque, not many people use it very
much. However, it is quite powerful and flexible, as we shall see.
The join
command will combine two text files, based on a common key. So - a bit like with a SQL
JOIN
command, if you have two files with a common key somewhere in them, you can use the
join
command to merge these two files together. And you can actually manage the results quite flexibly,
as we shall see.
Sample input files
Let's say we have two text files, one called countries.txt which provides a list of two-letter country codes (DE for Germany, FR for France, etc), and another called distros.txt which lists GNU/Linux distributions, their package management type (RPM/DEB/Zypper) and their countries of origin. For example:
countries.txt:
DE:Germany FR:France PT:Portugal UK:United Kingdom US:United States of America
distros.txt:
Knoppix:DEB:DE OpenSuSE:Zypper:DE APODIO:DEB:FR Debian:DEB:NONE Ubuntu:DEB:UK Kali:DEB:US Red Hat:RPM:US
First Run
We can use the join
command to combine these files, because they both have the ISO-3166 country code. Here, the "-t:
" means that the delimiter in these files is the colon (:
) character (these older *nix commands are missing a bit of consistency - it would be nice if they all used cut
's "-d
" for "delimiter", but join
and sort
use "-t
" instead)
The syntax continues to be rather odd, in that we now need to tell join
that the first file (countries.txt) is keyed on the first field, because "DE", "FR", etc are the first field in that file. To do this, we say "-1 1"
.
Then we say "-2 3
" to tell join
that the second file (distros.txt) is keyed on the third field, because those two-letter country codes are the third field in distros.txt.
Finally, we tell join
the names of file1 and file2: countries.txt and distros.txt, respectively.
$ join -t: -1 1 -2 3 countries.txt distros.txt DE:Germany:Knoppix:DEB DE:Germany:OpenSuSE:Zypper FR:France:APODIO:DEB UK:United Kingdom:Ubuntu:DEB US:United States of America:Kali:DEB US:United States of America:Red Hat:RPM
We now have a combination of the two files, merged on their common key of the two-letter country code.
Caveat
An important caveat to be aware of is that both files need to be sorted alphabetically before starting, and they need to be sorted based on their key. So a simple "sort distros.txt
" is not sufficient. The sort
command uses the same "-t:
" syntax as join
to say that the delimiter is the colon (:
) character, but it uses the "-k 3
" argument to sort the file on the third field with that delimiter.
So to make sure that the countries.txt file is correctly sorted, we need to run: "sort -o countries.txt countries.txt
" - again, sort
is ancient, it doesn't have the simplicity of "sed -i
", so we have to tell it "sort countries.txt and write the output to countries.txt". This invocation will replace the unsorted countries.txt with a new, sorted, version of the file.
To sort the distros.txt file requires a bit more - the key is in field three. So we run "sort -t: -k3 -o distros.txt distros.txt
" to sort the file on field 3, with a delimiter of ":
", writing the output to distros.txt, and reading the input from the same distros.txt file. Again, using the distros.txt name twice is necessary, and replaces the unsorted file with a sorted version.
Reformatting
We can do a bit more than this with the join
command - we can format the output as we wish. We have
told join
what the fields are, so we can tell it which fields, from which files we want, and in what order.
To do this, we use the "-o
" option, which again has a non-standard syntax. We use the format "-o FILE.FIELD
" syntax, where "-o 1.x
" means field "x" in the first filename listed, and "-o 2.y
" means field "y" in the second filename listed on the command line.
So we can get field #2 (Country Name) from countries.txt, merged with field #1 (distro name) from distro.txt:
$ join -t: -1 1 -2 3 -o 1.2,2.1 countries.txt distros.txt Germany:Knoppix Germany:OpenSuSE France:APODIO United Kingdom:Ubuntu United States of America:Kali United States of America:Red Hat
Or we might want to append the package manager (which is file 2, field 2, so append ",2.2
" to the "-o
" option:
$ join -t: -1 1 -2 3 -o 1.2,2.1,2.2 countries.txt distros.txt Germany:Knoppix:DEB Germany:OpenSuSE:Zypper France:APODIO:DEB United Kingdom:Ubuntu:DEB United States of America:Kali:DEB United States of America:Red Hat:RPM
We can reverse this by changing the "-o
" output options, to mix-and-match between the input files. Here, "-o 2.2,1.2,2.1"
" selects field 2 from file 2, then field 2 from file 1, followed by field 2 from file 1:
$ join -t: -1 1 -2 3 -o 2.2,1.2,2.1 countries.txt distros.txt DEB:Germany:Knoppix Zypper:Germany:OpenSuSE DEB:France:APODIO DEB:United Kingdom:Ubuntu DEB:United States of America:Kali RPM:United States of America:Red Hat
Missing Lines
The eagle-eyed amongst you will have noticed that there is one country listed which does not have matching distributions, and one distribution which does not have any particular country associated with it.
We can find these with the "-v
" option. We can ask for those in file 1, or those in file 2, which have no matches in the other file:
$ $ join -v1 -t: -1 1 -2 3 countries.txt distros.txt PT:Portugal $ join -v2 -t: -1 1 -2 3 countries.txt distros.txt NONE:Debian:DEB
Portugal does of course have Linux Caixa Mágica and PaiPix to its name, it's just that these were not listed in these example files.
Conclusion
Hopefully this will have removed some of the confusion around the join
command, and you will find
renewed confidence in using this old, somewhat archaic, but very useful command to manage your field-delimited text
files.
Appreciate this site? Please consider making a donation:
Books and eBooks
My Shell Scripting books, available in Paperback and eBook formats.
![]() Shell Scripting Tutorial is this tutorial, in 88-page Paperback and eBook formats. Convenient to read on the go, and to keep by your desk as an ever-present companion. | ![]() Shell Scripting: Expert Recipes for Linux, Bash and more is my 564-page book on Shell Scripting. The first half explains the features of the shell; the second half has real-world shell scripts, organised by topic, with detailed discussion of each script. |
Contact
You can mail me with this form. If you expect a reply, please ensure that the address you specify is valid. Don't forget to include the simple addition question at the end of the form, to prove that you are a real person!