Quick Edit
With sed, you can edit text data without an interactive user interface, using pipes or input redirection. Sed lets you execute extensive editing commands on a single line.
With sed, you can edit text data without an interactive user interface, using pipes or input redirection. Sed lets you execute extensive editing commands on a single line.
The sed stream editor [1] automates many repetitive operations, especially effectively inside a shell script. You can use regular expressions (regex) to provide a "nature" of the string. In this article, I'll start with the program screen output. If you want to participate and practice, simply type the text files in an editor of your choice.
The program calls up and accepts commands from virtually anywhere. You can pass commands directly or read them in from a separate file. The data can be piped, redirected, or input from a text file. The output can be sent to the screen (usually stdout ), through a pipe to the next command, or redirected to a destination file. With sed itself, no files are ever overwritten! The "Sed Call Options" box has the details. To resolve shell variables, you sometimes need to substitute the ' character with the " character.
Sed Call Options
Sed simply reads the text file and returns the results through stdout:
sed [COMMAND] [TEXTFILE]
Same with input redirection:
sed [COMMAND] < [TEXTFILE]
Inclusion of sed in one or more pipes:
[PROGRAM1] | sed [COMMAND] | ......
Commands stored in a separate file and read in:
... sed -f [SCRIPT] .....
Output of sed redirected to a text file, but omitting error messages:
... sed [COMMAND] > [TARGETFILE]
The same, but including error messages into the target file:
... sed [COMMAND] > [TARGETFILE] 2>&1
The basic syntax structure is shown in Figure 1. You'll notice for all locations where the editing commands should be used that addressing is required. You can provide many addresses as long as doing so doesn't affect clarity. If you want to change "everything except," you can negate the addressing with the ! character.
You can add multiple commands on one command line to sed as follows:
sed -e 'command1' -e 'command2' ... -e 'commandN' ....
Or, you can add these commands in a script file.
The script file should have a single line for each statement. For example:
s/Gans// s/jo/Jo/g
The instructions remove the word "Gans" – rather, it substitutes "nothing" for the word, but only for the first instance of the search string. The second lines substitutes "Jo" for "jo" for all instances because of the g option.
To create an executable sed script, include the shebang (#! ) interpreter statement on the first line:
#!/bin/sed -f
If you make the script executable (e.g., chmod 700 [SCRIPTNAME] ), you can call it like any other program. You wouldn't normally use this option. Rather, you would put sed and any script file calls in a shell script. In some cases, the order of the commands matters. Test your scripts before making them "real" to avoid errors and data loss.
You can use the textdata.txt file in Listing 1 to exercise your sed skills. This file looks as though it has been thrown together and contains empty lines, typos, and other errors. The second sample file I'll use in this article is called testlist.txt (Listing 2) and has dates formatted in different ways as content.
Listing 1
textdata.txt
chris hemsworth - Thor 0885465468798746 Scarlett Johansson - Black Widow 08755466584 Robert Downey - Iron Man 0987654321 Mark Ruffalo - Hulk 0405458765143321 Chris Evans - Captain America 0548/9988776655 Jeremy renner - Hawkeye 555/8812470 Tom Hiddleston - Loki 87841487014848 Samuel Jackson - Nick Fury 043/956026386 Cobie Smulders - Maria Hill 23514560145 Hugh jackman - Wolverine 801539193Paul Rudd - Ant Man 497349000
Listing 2
testlist.txt
22 April 1984 7.04.1985 30 March 1986 19 April 1987 03.04.1988 26 March 1989 15 April 1990 31-March-1991 19 April 1992 11 April 1993 3 April 1994 16. April 1995 7 April 1996 30 March 1997 12 April 1998
Some regular expressions are used in sed. You can use them to describe many string patterns. The more of them you use in sed, the more confusing it can get with more complex statements. Here, sed scripts can often help. Some characters are valid as special shell characters as well as regex instructions. These you need to "escape" with the \ character (Table 1). The construct [ABC] means "contains A or B or C" and the construct /ABC/ means "contains exactly that string."
Table 1
Special Characters
Character | Function |
---|---|
( | Opens statement |
) | Ends statement |
{ | Opens optional statement |
} | Closes optional statement |
[ | Opens a list of characters |
] | Closes a list of characters |
" | Masks a statement in which shell variables are resolved |
' | Masks a statement in which shell variables not resolved |
` | Encloses a statement block |
. | Any character other than a newline |
, | Separates parameters, such as line items |
: | Sets labels (t and b command) |
$ | End of document, end of line or last line |
& | Placeholder for search patterns, included in the replacement statement |
| | Or (regex separator) |
/ | Separator in editing commands |
^ | Beginning of line, or negation in a search pattern |
\ | Escape character |
! | After a line number: do not output this line |
* | 0 Or any number of times |
+ | Pattern present at least once |
= | Output line number |
\n | Newline, line feed |
\t | Tab character |
Confusingly, both sed and the editing commands have their own options. As is usual in Linux, these options are preceded with the - character. The editing statements take their options after them. Tables 2-4 provide an overview.
Table 2
Sed Options
Action | Option |
---|---|
Execute command (can usually be omitted) | -e |
Disable data buffering | -u |
Treat files separately | -s |
Use extended regexes | -r |
Create backup file | -i [FILEEXTENSION] |
Read and execute script file | -f [SCRIPTFILE] |
Suppress (unaffected) text areas | -n |
Show version | -v |
Table 3
Editing Commands
Action | Command |
---|---|
Add lines above this one | i |
Add lines below this one | a |
Output this line | p |
Output this line with a maximum length | l [LENGTH] |
Replace signs with others | y |
End sed | q |
Replace text in this line | c |
Delete this line | d |
Search and replace | s |
Table 4
Editing Command Options
Action | Option |
---|---|
Output line number | = |
All occurrences | g |
Outputs modified line with the s editing command | p |
Write the edited line in the file | w |
You can use the search function among other things to replace text sections. The search query represents the addressing. You can also use regex for search patterns. Table 5 show some of the possibilities, and Table 6 provides some examples. In this table, sed is used both in the data stream and the direct access to the text file. With composite addressing (two or more patterns), the statement is applied to all lines (except the first) matching the first address, up to and including the next line matching the second address.
Table 5
Patterns and Addressing
Action | Pattern |
---|---|
All lines | (null) |
Line 25 | 25 |
Not line 25 | 25! |
Lines 10 through 20 | 10,20 |
Last line | $ |
Not pattern | '/PATTERN/!' |
Character at beginning of line | ^CHAR |
String | /STRING/ |
Character set | [CHARS] |
Any character | [:alpha:] |
Lowercase | [:lower:] |
Uppercase | [:upper:] |
Alphanumeric | [:alnum:] |
Digit | [:digit:] |
Hexadecimal digit | [:xdigit:] |
Tab and space | [:blank:] |
Space | [:space:] |
Control character | [:cntrl:] |
Printable characters (no control characters) | [:print:] |
Visible characters (without spaces) | [:graph:] |
Punctuation | [:punct:] |
Table 6
Sample Searches and Patterns
Search for | Pattern | Example | Figure |
---|---|---|---|
Term, Name | '/TERM/' | cat textdata.txt | sed -n '/Meier/p' | - |
All lines containing "man" or "Man" | '/[Mm]an/p' | sed -n '/[Mm]an/p' textdata.txt | Figure 2 |
All lines except 3 through 5 | '3,5!' | sed -n '3,5!'p textdata.txt | Figure 3 |
All lines except those containing "Man" | '/Man/!' | sed -n '/Man/!'p textdata.txt | Figure 4 |
Lines containing "H" or "G" | '/[H|G]/' | sed -n '/[H|G]/'p textdata.txt | - |
Lines not containing "H" or "G" | '/[H]\|[G]/!' | sed -n '/[H]\|[G]/!'p textdata.txt | Figure 5 |
Line 3 | 3 | cat textdata.txt | sed -n '3p' | - |
Last line | '$p' | cat textdata.txt | sed -n '$p' | - |
Multiple patterns: Do not output lines containing an "R" somewhere followed by an "M" somewhere else | '/[R]./,/[M]./!' | sed -n '/[R]./,/[M]./!'p textdata.txt | Figure 6 |
All lines containing some alphanumeric characters (not all space characters) | '/[:alnum:]/' | cat textdata.txt | sed -n '/[:alnum:]/'p | Figure 7 |
Note that in Figure 6, the j comes before J in the text file. In the first example in Table 6, none of the lines containing H and J are output, which works because the order in the command and text file are the same. The second example with the negated H and j shows, however, that a line containing H must first be found. That's why johann still appears in the output!
If you want be certain in a clear way that sed is doing what you need it to do, you can combine several calls in the pipe. The following command suppresses empty lines and "Man" (see Figure 8):
cat textdata.txt | sed -n '/[:alnum:]/'p | sed -n '/Man/!'p
You use the s instruction to replace matched expressions. The length of search and replace strings is irrelevant. You can see the detailed syntax shown in Figure 9.
You can limit the search and replace statement to specific lines by preceding command with the line number as shown in the following example:
sed -n '5s/OLD/NEW/p' [TEXTFILE]
Or, for a range of lines:
sed -n '1,4/OLD/NEW/p' [TEXTFILE]
You can also suppress changes to certain lines using the exclamation point:
sed -n '20-80!s/OLD/NEW/p' [TEXTFILE]
Furthermore, you can limit changes to lines which contain certain strings or patterns that are not the same as the search and replace statement:
sed -n '/[STRING|PATTERN]/s/OLD/NEW/gp' [TEXTFILE]
You can delete the matched string with an empty string.
The first occurrence of the search string on a line is processed. To replace all instances, add the g (greedy) option at the end of the statement. The stream editor can be a silent partner if the -n option is set. So, if you want to see what's going on, add the p (print) option. You can also write results to an output file with the w (write) option. Table 7 shows some short examples.
Table 7
Sample Search and Replace Statements
Action | Example | Figure |
---|---|---|
Replace pattern at the first occurrence only | cat textdata.txt | sed -n 's/e/E/p' | Figure 10 |
Replace pattern at every occurrence | cat textdata.txt | sed -n 's/e/E/gp' | Figure 10 |
Delete the word "Man" | sed -n 's/Man//gp' textdata.txt | Figure 11 |
Replace "Iron" with "Tin" on line 4 | cat textdata.txt | sed -n '4s/Iron/Tin/gp' | Figure 12 |
Replace "0" with "089" on all lines containing "Man" or "man" | sed -n '/[Mm]an/s/0/089/gp' textdata.txt | Figure 13 |
Replace "0" with "089" on all lines except those containing "Man" or "man" | sed -n '/[Mm]an/!s/0/089/gp' textdata.txt | Figure 14 |
Delete all numbers and backslashes (/ ) and hyphens (- ) | cat textdata.txt | sed -n s'/[0-9\/-]//'gp | Figure 15 |
You can see a more complex example in Listing 1. It converts the inconsistently formatted date syntax in testlist.txt to a common, unified, albeit European (DD/MM/YYYY ) format. Be sure to press the Enter key immediately after the \ at the line's end. Alternatively, you can omit the sign and use the pipe character to connect with the line that follows it; however, this results in a less clear screen display.
The list is read in line 1 and starts the pipes in line 2. Line 2 takes any partially present leading space characters and substitutes the number 0 . Line 3 replaces any minus signs in dates with spaces. Lines 4 and 5 substitute any month written as a word with its numeric values followed by a dot. Line 6 substitutes any two-digit numbers at the beginning of a line (^ ), with the first being 1 through 3 and the second with any digit, and any space character with "itself" (& ) followed by a dot.
To make the search pattern repeatable during the replacement, enclose it in parentheses – which you have to be sure to escape with \ . The sed statement in line 7 deletes all existing space characters (through the g option for s ).
The uniq command on the last line ensures that all duplicate lines are uniquely output. Figure 16 shows the results. You can also "carry over" all or part of the original string into the replacement patterns in the replacement statement. Check out the following example:
echo "happy" | sed -n s'/happy/un&/'p
This example replaces happy with unhappy . You can also convert from lowercase to uppercase:
cat textdata.txt | sed -n s'/\([[:lower:]]\)/\U&/'pg
The \U before the & indicates the output must be converted into uppercase. You can do the following:
cat textdata.txt | sed -n s'/\([[:upper:]]/\L&/'pg
to convert from uppercase to lowercase.
Use the y option for character filtering and other applications. The pattern should contain all the characters that need to be replaced, and the replacement statement should have the same number of characters. The command structure should only have s , and -n should be omitted:
sed y'/[Search CHAR]/[Replacement CHAR]/'
Substitute the first character of only the lines in textdata.txt that begin with c in lowercase characters with uppercase character C (Figure 17).
You use c to replace entire lines:
sed 'PATTERN'c'REPLACEMENT'
You can also do it like this:
sed [LINE(n)] c'REPLACEMENT'
The example in Figure 18 deletes an empty line and replaces it with a series of dashes.
In place of a search pattern, line numbers can be used. Be aware that even if you specify multiple line numbers they will all be replaced by one single instance of the replacement string. So, if you pick three lines, it will seem like the first line gets replaces, and the second and third lines get deleted.
The example in Figure 19 deletes line 2 and substitutes a series of hash marks. The second example deletes through line 4 and replaces them with the given line.
You also can delete lines using the d option and using a search pattern or line numbers:
sed '/PATTERN/'d sed [LINE(n)]d
Using the commands in Figure 20, you can search and delete an empty line and then delete line 4.
With a , you add lines beneath and, using i , you insert a lines above the search pattern. To state where you must insert the line, you indicate the search pattern or a line number. If you enter multiple line numbers or the pattern matches multiple times, the insertion occurs for each instance (Figure 21).
In the second line, a new line is added above the first line in the file; whereas in the next line, it's added at the end ($ ). The command at the next prompt adds a new line above the matched search pattern, and the next line adds it below.
If a shell variable needs to be resolved, you need to enclose the statements in double quotes (" ) instead of single quotes (' ). The little shell script in Listing 3 shows how to handle variables. It searches through the sample file and outputs the matching lines. Figure 22 shows the result.
Listing 3
searchString.sh
01 #! /bin/sh 02 echo -n "Enter search string: ";read sstring 03 cat textdata.txt | sed -n "/$sstring/"p
With sed, you can execute complex text manipulation without user intervention. Its cryptic syntax might seem cumbersome at first, which is why building scripts bit by bit is a great option.
Infos