Small shell tools for text editing
|
Converting Character Sets
Various character sets have existed for a while, making data exchanges difficult at times. With the recode program, you can convert the text file to a desired character set. Probably the most important option is -l , which lists all the known source and destination character sets. With -f , you can force recodings under all circumstances and the process is irreversible. The -v option provides a verbose summary of the conversion process itself.
Further options are described on the manual web page.
The smart method is to copy the file to be converted and avoid using the original. The recode -l command will get you the character set information. You can then apply the conversion to the copied file.
The syntax is as follows:
recode [OLD_CHARACTER_SET]..[NEW_CHARACTER_SET] [FILENAME]
Necessary control characters are also added, such as CR or LF. As an example, you would do the following:
cp a.txt exported.txt recode -v UTF-8..ISO-8859-15 export.txt
to convert a file from its original UTF-8 character set to ISO-8859-15.
Replacing Tabs
The expand and unexpand programs adapting tabs within texts (in files as well as pipes). Both use the -t[NUM] option that determines how many space characters to substitute for a tab (normally eight).
With unexpand , you can use the -a option to convert all spaces (not just the first one) into tab characters. Note that using tr has pretty much the same effect.
Buy this article as PDF
Pages: 1
(incl. VAT)