Expansions
The moreutils package expands the standard tools for the shell with useful and sometimes exotic tools.
The moreutils package expands the standard tools for the shell with useful and sometimes exotic tools.
The shell lets you string together a myriad of discrete tools to solve complex tasks. Often the Coreutils utilities from the util-linux package are enough. There are, however, a number of situations that still don't have a standard solution in the form of a tool.
Together with some friends, Joey Hess has developed a set of programs grouped under the name moreutils [1] to fill some of these gaps. For the most part, these are specialized tools. Consequently, it is entirely possible that you have never heard of any of them. But if you spend a lot of time on the command line, then it would be a good idea to keep these small programs handy.
The current version of moreutils includes 15 programs (see Table 1). Each program is a nifty little tool that comes with a few options and is designed to accomplish a particular task. The tools were developed along the lines of classic Unix tools, so they work with a syntax that shares underlying similarities and concepts. There is a man page for each program; unfortunately, these are not very informative. See the "Installing moreutils" box for installation information.
Table 1
Categories
Category | Utility |
---|---|
Start programs | chronic , ifne , lckdo , parallel , zrun |
Edit (text) files | combine , isutf8 , ts , vidir |
Tools for Pipes | mispipe , pee , vipe , sponge |
Error Handling | errno , ifdata , isutf8 , mispipe |
Installing moreutils
You can find moreutils in the repositories of Ubuntu, Debian, and their derivatives, however not in the most recent version 0.60. This version is available for download on the project website [1]. There are many other distributions, but not all, that also offer moreutils in their package source. If you are using a distribution that does not have this package, then you will need to compile the sources.
Grab the source code with:
git clone git://git.joeyh.name/moreutils
In some situations, such as those involving scripts or cron jobs, the outputs of a program are irritating. chronic is helpful in resolving this problem. It expects the name of a program as its argument. chronic ignores the program's STDERR (error channel) and STDOUT (output channel) channels until an error actually occurs. You could do the same by using the construct
> /dev/null 2>&1
except that it would no longer be possible to preserve the output generated in the case of an error. The redirection to the null device simply would get rid of everything. Consequently, chronic is the better alternative.
Below is an example that uses chronic . Normally, this command line would create the desired EPUB file without generating any additional output. However, if an error does occur, you will get the entire output on the two channels, STDOUT and STDERR .
$ chronic a2x -v -fepub *.adoc
Two options control the behavior of chronic . The -e option, or STDERR triggering , makes sure that the program does not evaluate the executed program's return code. Instead it will decide whether errors have occurred on the basis of the output on STDERR . By default, a return code not equal to zero triggers an output.
chronic outputs a somewhat more detailed output if you use -v . In particular, this includes the return code for the executed program. In the process, it decides between the output on STDERR and STDOUT of the program that has been called.
The syntax for the ifne command is similar to that of chronic , and it also launches the program whose name is its first argument. However, it first evaluates the standard input in order to decide whether to actually start the program. Without any options, ifne will start the program only when the corresponding channel is not empty.
Below is a typical example from the man page. The administrator receives an email here if the find command finds at least one core file. The message directly contains the paths for the located files.
$ find . -name core | ifne mail -s "Core files found" root
ifne has one option, -n , that reverses its behavior. With this option, it will start the designated program only when the standard input is empty.
When automated processes are running, there are many situations in which just one instance of a program is permitted to run. If a process takes longer to execute due to an unexpectedly large amount of data that needs to be processed, then conflicts can arise, and you can even lose data. If cron launches programs at intervals that are too short, it can lead to a never-ending loop. The first instance has not yet finished its task while the next one overwrites the incomplete database.
In these situations, programmers will normally resort to lock files. Before the actual work begins, you should have the task set up a file. Often the file is empty, and typically you would set it up under /var/lock/ . When the task finishes, it deletes the file. If the file already exists when the task tries to start, then the software doesn't even run. Sometimes the lock file also contains the ID for the process that set it up.
This kind of lock mechanism is fairly uncomplicated to implement, even in scripts. However the lckdo option makes life even easier (Listing 1). To use this option, you should use a command like nice or nohup . The options control the tool behavior precisely.
Listing 1
Using lckdo
$ lckdo [Option] Lock-File Program
If a lock file already exists, then the tool will report this (Listing 2, line 3). Sometimes you won't be able to generate lock files in /var/lock/ because you may lack permission. The program also notifies you of this (Listing 2, line 5). In this case, you should use another directory, like your home directory.
Listing 2
lckdo output
01 $ lckdo yes.lock yes& 02 $ lckdo yes.lock yes 03 lckdo: lockfile `yes.lock' is already locked 04 $ lckdo /var/lock/yes.lock yes& 05 lckdo: unable to open ,/var/lock/yes.lock': Permission denied
lckdo deletes the lock file that has been generated after the program terminates. However, the lock files do not contain a process ID. Such an ID would make it easier to manually terminate a program if it has hung.
Another tool, flock , comes in the util-linux package:
sudo apt install util-linux
This program has a similar functionality to lckdo but comes with different options. lckdo 's man page therefore states that this tool will probably not be available in future versions of the moreutils package.
There are a variety of tools for starting multiple programs in parallel, even with different arguments. All of these tools are named either parallel or something similar. The moreutils package contains parallel-moreutils tool. The simple syntax for the tool is not much different than that of other implementations (Listing 3, line 1).
Listing 3
parallell-moreutils
01 $ parallel-moreutils Options -- Program with Arguments 02 $ parallel-moreutils Options Program -- Arguments 03 $ parallel-moreutils ufraw --out-path=ready/ ... -- *raw
In addition, the tool supports a second syntax variant (Listing 3, line 3), which can also be useful. There the double hyphen divides the RAW files that are defined by the pattern. By default, the software starts a process for each CPU core. Therefore this often significantly accelerates processing speeds. However, there is no linear correlation between number of cores and elapsed processing time since input and output operations constitute a bottleneck.
The option -j <number> lets you control the number of processes that start in parallel. It makes sense to keep <number> smaller than the number of processor cores. However, it is actually possible to make the number larger. Using -l <decimal number> , you can define the maximum load (this is a measure of system load) up to which the parallel-moreutils tool will start additional processes. Once the load exceeds this limit, no further processes are started until it falls below the limit.
It is possible to manually manipulate the commands executed with parallel-moreutils , which makes it universally applicable. There are two options for doing this. The -i option lets you rearrange the command. By default, you add the contents generated by an argument, such as the file names generated with *.raw , to the end of the command line. Using -i , the parallel-moreutils tool replaces each occurrence of {} in the command line with the text generated with its argument, such as the file names.
With -n <number> , you define the number of arguments that parallel-moreutils will take in one step and then pass onto the program that is its argument. You should not use this option in combination with -i .
Tools like parallel-moreutils tend to generate unusual effects, particularly with commands that are somewhat more complex. On the other hand, the small command zrun is easy to understand and also use by virtue of its simple syntax.
$ zrun <program with arguments> <compressed data>
This program decompresses archives given as its arguments on the fly, thus saving space on the drive. In the current version, the program works with archives with formats including GZIP, BZIP2, Z, XZ, LZMA, and LZO.
A typical use case involves editing a compressed text file. An attempt to to open this kind of file with an editor, such as nano , does not work (Figure 1). The zrun tool makes things look quite different (Figure 2). You can see in the title bar of the editor that zrun has generated a temporary decompressed file, which is then handed to the editor.
zrun deletes the decompressed temporary file after you've finished with it; therefore, all of the modifications are lost unless you save your edited file to a new uncompressed version under a different name. It is up to you to compress the new version of your file again.
zrun has an additional feature. If you create a link to zrun in the form of z<program> , then the link starts as zrun <program> . For example, create a link:
ln -s /bin/zrun /tmp/znano
You have now generated a temporary command called znano . Running znano <file> acts exactly the same as zrun nano <file> .
A second group of moreutils commands is used for processing or testing files. There are many differences among these programs. Most target text files, which is not surprising since these kind of files have been used for a multitude of tasks for many years. For example, configuration files and many log files are in plain text format.
Traditional Unix tools are very powerful for working with text files. These tools are all available under Linux. Even so, there are some special maneuvers required for certain tasks. Such tasks would include logical linking of text files, something which the combine command makes possible.
This command combines all of the lines of two files via a logical operator and then shows the results. The operators include AND (logical and), OR (logical or), NOT (not), and XOR (exclusive or). With the AND operator, only those lines that exist both in the first and second file are returned. The content of the text lines is irrelevant to the combine operation; however, it scans the text in its entirety.
Below is the syntax for the command. You can replace every file with standard input using - . The operators you apply are recognized by the program independently of the way they are written.
$ combine <file> <Operator> <file>
Listings 4 and 5 show the two files used to demonstrate the different operations you can carry out on different operations using combine . Listing 6, for example, shows what happens when you use the XOR filter on A and B: You get all of the lines from the files that are not identical in both.
Listing 4
File A
a1 a2 a3 a4
Listing 5
File B
a1 b2 b3 a4 b5
Listing 6
A XOR B
$ combine A xor B a2 a3 b2 b3 b5
The files A (Listing 4) and B (Listing 5) have a two lines in common. Otherwise all of the lines are different. The tool filters the latter out via XOR . It is worth mentioning here that the B file is one line longer than the A file. Nonetheless, b5 appears in the output. This means that the tool even processes empty lines.
In Listing 7, you can see what happens when you use AND to combine A and B. Only the lines that are the same in both are shown.
Listing 7
A AND B
$ combine A and B a1 a4
Text files on current Linux systems normally exist as UTF-8 coded data. If this is not the case, when you process them, unexpected stuff happens. It is a good idea to first check to see whether the coding is correct. You can test files using isutf8 from the moreutils package to determine whether the file is indeed coded as UTF-8. If the coding is OK, then there will not be any output. If not, the software will indicate text that does not conform (Listing 8).
Listing 8
Testing with isutf
$ isutf8 1bicec.tmp 1bicec.tmp: line 1, char 15, byte 14: Expecting bytes in the following ranges: 00..7F C2..F4.
The actual evaluation is performed as usual with the return values of the command. The tool outputs zero for UTF-8 files. There are a few options for modifying the behavior of isutf8 . For example, -q , or --quiet , suppresses the message that a file does not conform to UTF-8. Using -l , or --list , you can specify that the software outputs only the names of nonconforming files. Using -i , or --invert , only those files that conform to UTF-8 are shown.
Timestamps are made up of short strings containing the current time. These are used for things like dating entries in the log files. The moreutils package comes with the ts tool. It has a command that adds a timestamp to the beginning of each line of STDOUT output. These timestamps let you monitor things like how long a command is taking to complete before passing control to the next command.
Take the script, tstest.sh , in Listing 9, for example. Listing 10 shows what the output looks like using ts combined with different options.
Listing 9
tstest.sh
echo "Here we go." sleep 3 echo "three seconds later." sleep 2 echo "five Seconds have passed."
Listing 10
Running with ts
$ bash tstest.sh | ts Nov 28 14:23:15 Here we go. Nov 28 14:23:18 Three seconds later. Nov 28 14:23:20 five seconds have passed. $ bash tstest.sh | ts "%c" Mo 28 Nov 2016 14:30:35 CET Here we go. Mo 28 Nov 2016 14:30:38 CET three seconds later. Mo 28 Nov 2016 14:30:40 CET five seconds have passed. $ bash tstest.sh | ts -s 00:00:00 Here we go. 00:00:03 three seconds later. 00:00:05 five seconds have passed. $ bash tstest.sh | ts -i 00:00:00 Here we go. 00:00:03 Three seconds later. 00:00:02 five seconds have passed.
The tool constructs these timestamps with the strftime function. This function is like the one used in the date command. It is possible to adapt the output using placeholders in the form of short character strings, such as %s and %D . Each of these strings represents a particular aspect. You can learn more about the possibilities with man 3 strftime . Table 2 contains a summary of the most important character strings.
Table 2
Placeholders
Abbreviation | Function |
---|---|
%s | The number of seconds that have passed since 1.1.1970 00:00h |
%b | Abbreviated name of the month |
%d | Day of the current month, decimal (1..31) |
%H | Hour of the current day, decimal (0..24) |
%M | Minutes in the current hour, decimal (0..60) |
%S | Seconds in the current minute, decimal (0..60) |
%x | Local date, without time of day |
%X | Local time of day |
%c | Local date, with time of day, in the format: Fr 04 Nov 2016 13:33:44 CET |
%R | Time of Day in 24-hour format |
%T | Corresponds to %H:%M:%S |
By default, ts uses the format "%b %d %H:%M:%S" for output. You can specify another format with quotation marks for the last argument or as the argument for the -s option.
There are some additional options for controlling the tool. The software converts given timestamps into relative amounts with the -r option. In testing however, this option only works for stamps that have been generated in the default format. With -i , the software computes the difference between the current and the previous timestamp.
There are some situations that require you to manually modify outputs produced by commands like ls or find . Most of the time, this is done during preparations for other actions. For example you may want to prepare an archive or process files directly, such as deleting a bunch of files.
The moreutils package has a special tool, vidir , that does exactly this. It loads the contents of a directory that by default will be the current directory, or it can load specified parts of the directory (vidir *.jpg ) to a preset editor. The environment variable $EDITOR , or alternatively $VISUAL , define this editor. In order to edit the search results obtained, you should pipe the results from find into vidir . This means you will be able to do things like edit subdirectories with find | vidir - .
The format used by the program is somewhat surprising at first. Each line begins with a line number which vidir uses after the editor shuts down as a way to figure out which files are to be removed. For example if you delete line 18 of the temporarily created file in the editor, then vidir attempts to delete the file contained in line 18 as well. The analogous action of creating files this way does not work. However, you can rename files by changing the name in the file. These name changes are accomplished by reversing the line numbers.
The capability of vidir is highly dependent on the power of the editor that is used. For example, Emacs lets you perform complex, rule-based substitutions and deletions. Having said this, this particular editor comes with its own very powerful interface for the filesystem.
Pipes rank among the most powerful constructs in shells. Indeed there is no more effective mechanism for transferring data between different processes. Consequently, pipes are used… a lot . However, by default, many shells only evaluate the return value of the most recent command of a pipe. If one of the previous commands called in the pipe results in an error, then the shell will either not notice this, or it will not respond adequately. This can happen for example in a short circuit test, ||, && .
Many pipes consist of just two commands. The first generates the data, and the second processes it. At this point, it would be important to react if an error has occurred during the first step. This is where mispipe comes into play. It evaluates the return value of the first command.
The command below always generates the output Ooops when there is no file matching the missing* pattern. This occurs because the ls command has generated a return value of 2 . The return value for the grep command doesn't matter here. When the ls command detects matching files, grep filters out all of the lines containing a 1 .
$ mispipe "ls -l missing*" "grep 1" || echo Ooops
The rude sounding pee command was also developed for use in pipes. This is the counterpart to the file-oriented tee command (Listing 11). tee outputs the output of a script or program to two files and STDOUT . pee , on the other hand, distributes the output to two pipes (so it can be used by another program) and omits it on STDOUT .
Listing 11
Pipes with pee
$ ./script.sh | tee output1.txt output2.txt $ ./script.sh | pee "grep pattern" "gzip -c > output.gz"
The vipe program makes it possible to use editors between pipes (Listing 12). At first glance, this might seem to be of limited value since pipes connect the STDOUT channel of the first command to the default input of the second command. Sometimes it makes sense to edit data before the processing continues in order to save storage along the way. vipe first reads in the entire dataset, processes it, and then passes the results on after the editor is finished. As with vidir , vipe also evaluates the variables EDITOR and VISUAL as part of designating the editor.
Listing 12
Editing with vipe
$ man less | vipe | cat LUTHER(1) General Commands Manual LUTHER(1) NAME luther - opposite of pope COPYRIGHT Copyright (C) 1517-2016 by the Lutheriden AUTHOR Faked 2016 by Joerg Luther <jluther@linux-user.de> Version 499: 28 Nov 2016 LUTHER(1)
The sponge command is also suitable for use in pipes. The shell does offer the possibility of redirecting data at any time to a file for transfer. The data is continuously deposited in pieces (usually line by line) in a file throughout the entire run of the command line, or pipeline.
sponge does this in a different way. It first collects all the data internally and then transfers it to a file all in one go at the very end of a run. Among other things, this makes it possible to use the same file for input and output.
Take for example file.txt shown in Listing 13. When no output file has been designated, the program uses the standard output channel. If an output file exists, it preserves its permissions. The -a option, shown in the second example of Listing 14, appends the output to the contents of the file.
Listing 13
file.txt
7 5 3
Listing 14
Sorting with sponge
$ sort -u file.txt | sponge file.txt $ cat file 3 5 7 $ sort -r file.txt | sponge -a file.txt $ cat file.txt 3 5 7 7 5 3
Listing 15
file.txt
$ errno eio EIO 5 Input-/Output error $ errno 5 EIO 5 Input-/Output error $ errno -s Output error EIO 5 Input-/Output error EREMOTEIO 121 Input-/Output error of the remote $ errno -l | grep 121 EREMOTEIO 121 Input/Output error of the remote
There are a number of Linux tools like strace that output very cryptic messages concerning supposed errors. Normally you would look up the symbolic name by means of:
man 3 <error number>
in the hope that the description offers some help.
It is much easier to match up names and suitable numbers by using errno (Listing 15). The -s option (--search ) lets you search among the list of error messages for a particular term independently of its spelling. In addition, the program looks via -S for all man pages that have been installed in other languages. Occasionally, you get additional hits this way. Searching by error number does not work like this. Instead you should use the parameter -l to have errno output all of the error messages and then search through the list with grep .
The ifdata command checks to see whether a particular network interface exists and is active. It is possible to query the interface for statistics. The simple syntax used by the
ifdata <Option(s)> <Interface>
pattern, makes it clear that the command is primarily designed for use in scripts. Table 3 lists the most important options of ifdata .
Table 3
ifdata: Important Options
Option | Meaning |
---|---|
-e | Tests whether the interface exists |
-pe | yes : the interface exists, otherwise no |
-pa | IPv4-addresses for the interface |
-ph | Hardware addresses of the interface |
-si | Input Statistics |
-sip | Number of input packages |
-sib | Number of input bytes |
-sie | Number of input errors |
-so | Output statistics |
-sop | Number of output packages |
-sob | Number of output bytes |
-soe | Number of output errors |
-bips | Input bytes per second |
-bops | Output bytes per second |
In addition to some truly helpful tools like pee and ts , the moreutils package has very special tools for a wide variety of tasks. Many users can probably do without many of the utilities, but if you are someone who writes a lot of scripts, then you will quickly come to appreciate the useful shortcuts they afford.
Infos