Accelerated command processing with GNU Parallel
|
Building Blocks
In the previous commands Mogrify replaces the original image with a resized one. Better yet, you can save the edited version in a separate preview/ directory:
$ mogrify -resize 50% photo.tif preview/photo.tif
However, here you have the problem that the file name appears twice in the command. You need to tell GNU Parallel not only to append the file name to the mogrify command, but also to put it in a specified location. To do this, you can use the {} placeholder, as used on line 1 in Listing 2.
Listing 2
Using One Parameter in Several Places
01 $ parallel mogrify -write preview/{} -resize 50% {} ::: *.tif 02 $ parallel convert {} {.}.png ::: *.tif
GNU Parallel substitutes the parameter name (in this case the file name) for the curly braces ({} ). This placeholder also has other practical variants. The {.} placeholder drops the file extension, which can be used, for example, with convert to change the resized images into PNG format (line 2). Similarly, the {//} placeholder is replaced by the string with the directory name of the input line.
If no command is passed to GNU Parallel, it assumes that the input line contains the command. You can exploit this feature to feed GNU Parallel a number of different commands:
$ (echo ls; echo pwd) | parallel
By using the echo command, the shell provides the two lines ls and pwd to GNU Parallel to run in parallel (Figure 5).
Big Ones
Not only converting photo collections but processing large files can be a time sink. Here again, GNU Parallel helps. The following command, for example, feeds the large ubuntu.img file to the tool:
cat ubuntu.img | parallel --pipe --recend '' -k gzip >ubuntu.img.gz
By using the --pipe option, GNU Parallel splits the file into 1MB blocks. It then compresses each block – again in parallel – with Gzip. The tool collects all the zipped data blocks, puts them in the correct order (with -k ), and saves them in the ubuntu.img.gz file. It is then easy enough to unzip it again with gzip -d .
The --recend option stands for "record end," and you can use it to specify the end of a data block. Without this option, GNU Parallel looks for newlines to split the data into new records. The example above splits the data into 1MB blocks, which are then passed to Gzip. The repeated single quote (' ' ) supersede the default --recend behavior.
Other large files can be processed in parallel in the same way . The documentation also shows how to sort a big file in parallel [4].
Compressing with Gzip incidentally has a minor flaw. Because GNU Parallel works on blocks, the archiver might not see the whole file and, therefore, might not be able to compress it efficiently. The difference depends on the data to be compressed.
« Previous 1 2 3 4 Next »
Buy this article as PDF
Pages: 4
(incl. VAT)