Processing Text with Linux Shell - Part 1
Shamil
Posted on July 27, 2018
Into the world of sed
If you are using any *nix systems on a daily basis, chances are you are already familiar with, or at least you have heard about the sed
command.
sed
, short for Stream Editor
, is a text transformation tool that comes bundled with every unix system. What makes sed
distinguishable from other text editors is the speed at which the text manipulation is performed. sed
only makes one pass over the input text, therefore making the processing quite faster.
# Replace those ugly text
sed
is a very powerful tool to replace a piece of text with another. The text can be matched using regular expressions.
sed 's/text_to_be_replaced/replacement_text/' file_name
However, this will only print the substitued text in the console, but won't change the same in the file itself. If we want to save the changes to the file, we can use the -i
flag.
sed -i 's/text_to_be_replaced/replacement_text/' file_name
This above replaces only the first occurance of the given pattern in each line. So if we want to replace every occurence of the pattern, we can append the g
parameter to the end.
sed 's/text_to_be_replaced/replacement_text/g' file_name
Note that the delimiter character /
we used in the above commands is not fixed, we can use almost any delimiter character in sed
. For example,
sed 's:text_to_be_replaced:replacement_text:g' file_name
sed 's|text_to_be_replaced|replacement_text|g' file_name
Okay, but what if the delimiter character is itself a part of the pattern to be replaced? ¿ⓧ_ⓧﮌ
Well, we can escape that character with a backslash. For example, to replace the word following:
with below -
, we can do this:
sed 's:following\::below - :' file_name
Notice the use of \:
before the delimiter :
that separates the pattern and it's replacement.
# Delete that scrap
sed
also allows us to delete lines from a file. The d
option is used to indicate a delete operation. The generic syntax to delete line is
sed 'Nd' file_name
Here N
is the line number that we want to delete. If we want to delete the 10th line from a file, N
would be 10.
One most common use of this command is deleting all blank lines in a file.
sed '/^$/d' file_name
The above will delete all the blank lines in the file. The regular expression ^$
marks an empty line and the d
option specifies that the line should be deleted.
That's not it. We can also specify a range of lines that should be deleted.
sed 'm,nd' file_name
The above command will delete all the lines starting from m
th upto n
th.
# Pipelining is important
Now what about pipelining multiple sed
commands?
We can pipeline as many sed
as we wish and they would be processed in that order. Consider the following example.
echo Linux | sed 's/L/l/' | sed 's/n/N/' | sed 's/l/L/' | sed 's/x/X/'
This will output LiNuX
.
Finally let's take a look at how we can use variables within sed
command. So far we have used ' '
(single quote) in our commands. However we can aslo use " "
when we need to use an expression in our command. Take a look at the following example.
greet=hello
echo hello shamil | sed "s/$greet/hi" file_name
This will replace evaluate the value of $greet
and and replace hello
with hi
.
# Better safe than sorry
When using -i
in the sed
command, we need to be careful, as it replaces the actual content in the file. (Trust me, I have done this many times)
Therefore, it is a good practice to first use this command without -i
flag and check if the replacements are correct. However, if the file contents are too long to be checked like that, you can use the following command to create a backup copy of the same and then modifying the content.
sed -i.bak '12,30d' file_name
This will delete all lines from 12 to 30, but most importantly it will create a file_name.bak
in the same directory before modifying the actual file.
Who knows, this might just end up saving your job (◠﹏◠)
(EDIT: See this comment for more info on -i
usages)
Posted on July 27, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.