The Forgotten Power of Unix Text Utilities

wizard hat

I'm the first to extol the virtues of scripting languages like Python and Perl in particular. But they aren't always the best tool for the job. It's often forgotten how powerful the original Unix (and now GNU) text processing utilities are. Someone once asked me how to combine specific columns from multiple CSV files into a new CSV file. They had the start of a Perl solution that was not working correctly, and wanted advice on it. My advice was to go with a one-line shell solution which is simply this:

paste -d, <(cut -d, -f3 file1.csv) <(cut -d, -f3 file2.csv) > output.csv

This will combine the third column from each specified file into a new file. It relies on a feature of the more modern Bourne shells, process substitution - the two parts that look like <(). Here it is in action:

dmaxwell@kaylee:~$ cat foo1.txt a1,a2,a3 b1,b2,b3 dmaxwell@kaylee:~$ cat foo2.txt A1,A2,A3 B1,B2,B3 dmaxwell@kaylee:~$ paste -d, <(cut -d, -f3 foo1.txt) <(cut -d, -f3 foo2.txt) a3,A3 b3,B3

You can paste columns from as many files as you need here. One catch, of course, is that this only works with simple CSV data - meaning there are no embedded commas in the data fields themselves. But this is much more understandable than any lengthy scripting language solution.

One other tip, if you had to get rid of the first row, which might contain column header data, just pipe the output through tail:

paste -d, <(cut -d, -f8 file1.csv) <(cut -d, -f8 file2.csv) | tail -n +2 > output.csv