Small shell tools for text editing
|
Splitting Files
The split command is used to split large files into manageable pieces to fit on smaller media. You can store split documents in different places to make it harder for someone to assemble them for nefarious reasons. You can always reassemble the files using cat .
Table 15 lists the most important options. The size values you can give in bytes (without further additions), kilobytes (K), megabytes (M), or gigabytes (G). In most cases, you would use a filename prefix. The command has the form split [-OPTION] [FILE] [PREFIX] . You would use a dash symbol instead of the filename in a pipe.
I'll show three examples. After each example, you can check out the result by using cat on the files that begin with Part.* in your work directory.
First, divide a file into a predetermined number of subfiles using
split -n2 a.txt "Part."
Note the dot at the end of the prefix value. You can use a pipe and specify a file size for splitting the file as follows:
cat a.txt | split -b20 - "Part."
The next example shows how to split a file line by line, which can help in structuring files for extracting addresses (roughly the same as using head and tail ). In the case of split , numeric entries are required:
split -d -l2 a.txt "Part."
In some ways, the csplit instruction is similar to cut , except that it splits a file vertically based on patterns. You should already be familiar with some of the options shown in Table 16.
The pattern is expressed as /…/ . The file is split on the first occurrence of the pattern. You can also provide a number or {*} for "repeat the pattern as many times as possible" to generate the subfiles. The usual rules apply to the search patterns as far as escaping the shell special characters. If using a pipe, be sure to enter a dash instead of the filename.
To see how this works, you can copy the a.txt to h.txt and add three dashes at the beginning of each line, as shown in Listing 3. This is the separator in csplit for creating subfiles.
Listing 3
h.txt
01 --- 02 Bauer Anton 03 --- 04 Meier Manfred 05 --- 06 Mueller Sabine 07 --- 08 Schmidt Rosi
Next, create the subfiles from h.txt .
csplit h.txt -z -f "Part." /---/ {*}
Check the result by using cat on the files that begin with Part.* in your work directory.
Wrapping Text
If you want to wrap long lines of text in a controlled way, you can use the very handy fold command and its options:
fold -sbw[COLUMNS] [FILE]
You can also use the fold program in a pipe, in which case you would omit the filename. The -b option indicates the bytes instead of columns, which prevents surprises involving control characters. To prevent breaking up words, you can use -s to break at spaces only. Use -w[NUM] to indicate the number of columns different from the default 80.
It's not advisable, however, to use fold as a cut substitute if you want to split lines by fields. With addresses, for example, house numbers and street names commonly end up on different lines.
Buy this article as PDF
Pages: 1
(incl. VAT)