Linux sed命令的使用

PART1 Linux sed命令的基础

初识sed

Reference: Sed Tutorial - Tutorialspoint.

字符流编辑工具(行编辑工具),可以每行字符进行处理

优势:擅长对行进行处理;擅长对文件内容进行修改/删除。

SED can be used in many different ways, such as:

  • Text substitution,
  • Selective printing of text files,
  • In-a-place editing of text files,
  • Non-interactive editing of text files, and many more

sed工作流程

sed的工作流程图如下(本图源于Sed Tutorial - Tutorialspoint)

  • Read: SED reads a line from the input stream (file, pipe, or stdin) and stores it in its internal buffer called pattern buffer.
  • Execute: All SED commands are applied sequentially on the pattern buffer. By default, SED commands are applied on all lines (globally) unless line addressing is specified.
  • Display: Send the (modified) contents to the output stream. After sending the data, the pattern buffer will be empty.
  • The above process repeats until the file is exhausted.

Point to Note:

  • Pattern buffer is a private, in-memory, volatile storage area used by the SED.
  • By default, all SED commands are applied on the pattern buffer, hence the input file remains unchanged. GNU SED provides a way to modify the input file in-a-place. We will explore about it in later sections.
  • There is another memory area called hold buffer which is also private, in- memory, volatile storage area. Data can be stored in a hold buffer for later retrieval. At the end of each cycle, SED removes the contents of the pattern buffer but the contents of the hold buffer remains persistent between SED cycles. However SED commands cannot be directly executed on hold buffer, hence SED allows data movement between the hold buffer and the pattern buffer.
  • Initially both pattern and hold buffers are empty.
  • If no input files are provided, then SED accepts input from the standard input stream (stdin).
  • If address range is not provided by default, then SED operates on each line.

sed的通用语法格式

1
sed [OPTION]... [SCRIPT] [input-file]...

[SCRIPT]: 可以用分号(;)分隔每一条表达式,每一条表达式应该是条件+命令组成的。比如"1d"(删除第一行), “1d;3d”(删除第1和3行), "1d"中’1’是条件范围,'d’是命令,表删除操作。

整个SCRIPT的所有表达式作为一个整体括起来时,最好使用单引号(')括起来,有些命令在双引号(")中无法正常使用,比如sed -n "/^/!p"中无法识别'!'号,而使用sed -n '/^/!p’就可以正常使用。因此推荐单引号括[SCRIPT],即:

1
sed [OPTION]... '[SCRIPT]' [input-file]...

什么是pattern space(模式空间)?

linux - The Concept of ‘Hold space’ and ‘Pattern space’ in sed - Stack Overflow:
When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

pattern spaces又叫做pattern buffer(模式缓冲区)。

sed的工作流程可知,sed每读入输入文件流的一行内容,这一行内容就会插入模式缓冲区,也就是模式空间中,当所有行内容都读入后,文件内容全部都插入到模式空间了。

什么是hold space(保留空间)?

linux - The Concept of ‘Hold space’ and ‘Pattern space’ in sed - Stack Overflow:

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.

hold space又叫做hold buffer(保留缓冲区)。只需知道保留缓冲区可以存储输入文件内容的数据,在sed处理其他一行信息时,会用到这个缓冲区的内容,并且你无法直接操作此区域。

Unix Sed Tutorial : 7 Examples for Sed Hold and Pattern Buffer Operations:

As its name implies, sed hold buffer is used to save all or part of the sed pattern space for subsequent retrieval. The contents of the pattern space can be copied to the hold space, then back again. No operations are performed directly on the hold space. sed provides a set of hold and get functions to handle these movements.

PART2 使用sed命令

Reference: Linux sed命令完全攻略(超级详细).

使用sed打印模式空间内容

抑制"自动打印模式空间内容到终端"的特性(‘-n’)

Option ‘-n’

-n (–quiet, --silent) - suppress automatic printing of pattern space

'p’用于表达式{script-only-if-no-other-script}中

p - Print the current pattern space.

从"什么是pattern space(模式空间)?"我们可以知道所有行内容都存在了模式空间,因此如果不指定-n选项,所有行内容会打印到终端上:

test.txt内容

1
2
3
4
5
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

'1p’表示打印输入文件的第一行

1
2
3
4
5
6
7
$ sed '1p' test.txt
101:Java,ABC
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

可以发现虽然第一行打印了,但是整个文件内容也一并打印了。我们看看加上’-n’选项后会怎么样:

1
2
$ sed -n '1p' test.txt
101:Java,ABC

由此可以知道sed命令处理完成之后,总是会把文件内容打印到终端,为了只获得相关内容,我们使用’-n’为了抑制这一特性。p命令总是配合’-n’一起使用。

单行和多行捕获

sed p一般语法格式如下

1
2
sed [OPTIONS...] '[ADDRESS]p' INPUT_FILE
sed [OPTIONS...] '[ADDRESS]p;[ADDRESS]p;...' INPUT_FILE
  • ‘<n>p’ - 捕获输入文件内容的第n行
1
2
$ sed -n '1p' test.txt
101:Java,ABC
  • ‘<n>,<m>p’ - 捕获输入文件内容的第n到m行
1
2
3
4
$ sed -n '1,3p' test.txt
101:Java,ABC
102:Python,DEF
103:PHP,GHI
  • ‘<n1>p;<n2>p;…;<ni>p’ - 捕获输入文件内容的多行,行号自定义
1
2
3
4
5
6
7
$ sed -n '1p;3p' test.txt
101:Java,ABC
103:PHP,GHI
$ sed -n '1p;3p;4p' test.txt
101:Java,ABC
103:PHP,GHI
104:C,JKL

捕获匹配指定文本的行

  • ‘/[match_text…]/p’ - 捕获匹配指定文本([match_text…])的行.
1
2
3
# 找到包含"Java"信息的行
$ sed -n '/Java/p' test.txt
101:Java,ABC
  • ‘/[match_text_1…]/p;/[match_text_2…]/p;…;/[match_text_i…]/p’ - 捕获匹配指定文本的多行.
1
2
3
4
# 找到包含"Java""PHP"的行
$ sed -n '/Java/p;/PHP/p' test.txt
101:Java,ABC
103:PHP,GHI

使用sed添加内容

i \

test - Insert text, which has each embedded newline preceded by a backslash.

a \

text - Append text, which has each embedded newline preceded by a backslash.

每行前或后添加内容

  • ‘i\[text]’ - 每一行之前都添加内容([text])
1
2
3
$ sed 'i\100:C++,QQQ' test.txt
# 或者是:(可读性不好)
$ sed 'i100:C++,QQQ' test.txt

Output:

1
2
3
4
5
6
7
8
9
10
100:C++,QQQ
101:Java,ABC
100:C++,QQQ
102:Python,DEF
100:C++,QQQ
103:PHP,GHI
100:C++,QQQ
104:C,JKL
100:C++,QQQ
105:Linux,MNO

同样地,也可以在每一行后面添加内容

  • ‘a\[text]’ - 每一行之后都添加内容([text])
1
2
3
$ sed 'a\100:C++,QQQ' test.txt
# 或者是:(可读性不好)
$ sed 'a100:C++,QQQ' test.txt

指定行前或后添加内容

在指定的行之前插入内容

  • ‘<n>i\[text]’ - 在第n行之前插入内容([text])
1
2
3
$ sed '1i\100:C++,QQQ' test.txt
# 或者是:(可读性不好)
$ sed '1i100:C++,QQQ' test.txt

Output:

1
2
3
4
5
6
100:C++,QQQ
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

同样地,也可以在指定的行之后插入内容

  • ‘<n>a\[text]’ - 在第n行之后插入内容([text])
1
2
3
$ sed '1a\100:C++,QQQ' test.txt
# 或者是:(可读性不好)
$ sed '1a100:C++,QQQ' test.txt

注意,由于你的操作都是在缓冲区完成的,并且缓冲区的内容没有写入到原文件,因此这些上述sed命令并不会修改原文件。

匹配指定文本的行前或后添加内容

  • ‘/[match_text…]/i\[text]’ - 在匹配了指定文本([match_text…])的行之前添加内容([text])

  • ‘/[match_text…]/a\[text]’ - 在匹配了指定文本([match_text…])的行之后添加内容([text])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ sed '/Java/i\222:C#,GGG' test.txt
222:C#,GGG
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO
$ sed '/Java/a\222:C#,GGG' test.txt
101:Java,ABC
222:C#,GGG
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

交互式添加内容

1
2
3
4
5
6
7
8
9
# 输入sed 'li\然后回车就可以添加内容了
$ sed '1i\
> Hello world!' test.txt
Hello world!
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

使用sed删除内容

删除单行

  • ‘<n>d’ - 删除第n行
1
2
3
4
5
$ sed '1d' test.txt 
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

删除多行

  • ‘<n,m>d’ - 删除第n~m行
1
2
3
$ sed '1,3d' test.txt 
104:C,JKL
105:Linux,MNO
  • ‘<n1>d;<n2>d;…;<ni>d’ - 删除输入文件内容的多行,行号自定义
1
2
3
$ sed '1d;3d;5d' test.txt 
102:Python,DEF
104:C,JKL

删除匹配指定文本的行

  • ‘/[match_text…]/d’ - 删除匹配文本的行
1
2
3
4
5
6
# 删除包含"Java"的行
$ sed '/Java/d' test.txt
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO
  • ‘/[match_text_1…]/,/[match_text_2…]d’ - 删除匹配文本1和文本2之间的所有行
1
2
3
4
# 删除包含"Java""PHP"之间的行
$ sed '/Java/,/PHP/d' test.txt
104:C,JKL
105:Linux,MNO
  • ‘/[match_text_1…]/d;/[match_text_2…]/d;…;/[match_text_i…]/d’ - 删除匹配指定文本的多行
1
2
3
4
5
# 删除包含"Java""PHP"的行
$ sed '/Java/d;/PHP/d' test.txt
102:Python,DEF
104:C,JKL
105:Linux,MNO

一个重要应用实例:(删除空行内容,See also: How to delete empty lines using sed command under Linux / UNIX - nixCraft)

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat test.txt
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL

105:Linux,MNO
$ sed -n '/^$/!p' test.txt
101:Java,ABC
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO

使用sed查找并替换(处理结果不写入原文件)

查找并替换

一般语法格式如下

1
sed [不包括('-i')的OPTIONS...] 's/SEARCH_REGEX/REPLACEMENT/g' INPUTFILE

'g’为全局替换标志,如果不添加这个标志,只替换每行匹配的第一个信息

替换所有行

一般语法格式如下

1
sed [不包括('-i')的OPTIONS...] 'c\REPLACEMENT' INPUTFILE

举例

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sed 'c\Hello world!' test.txt
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
# 或者是:(可读性不好)
$ sed 'cHello world!' test.txt
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!

替换指定行

  • ‘<n>c\[REPLACEMENT]’ - 第n行的内容替换为REPLACEMENT
1
2
3
4
5
6
7
8
9
10
11
12
13
$ sed '1c\Hello world!' test.txt
Hello world!
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO
# 或者是:(可读性不好)
$ sed '1cHello world!' test.txt
Hello world!
102:Python,DEF
103:PHP,GHI
104:C,JKL
105:Linux,MNO
  • ‘<n>,<\m>c\[REPLACEMENT]’ - 第n~m行的内容替换为REPLACEMENT(注意:替换后多行内容变成一行内容)
1
2
3
4
$ sed '1,3c\Hello world!' test.txt
Hello world!
104:C,JKL
105:Linux,MNO

使用sed查找并替换字符串

Reference: How to Use sed to Find and Replace String in Files | Linuxize

替换用法的语法格式

The general form of searching and replacing text using sed takes the following form:

1
sed -i 's/SEARCH_REGEX/REPLACEMENT/g' INPUTFILE
  • -i - By default, sed writes its output to the standard output. This option tells sed to edit files in place. If an extension is supplied (ex -i.bak), a backup of the original file is created.
  • s - The substitute command, probably the most used command in sed.
  • / / / - Delimiter character. It can be any character but usually the slash (/) character is used.
  • SEARCH_REGEX - Normal string or a regular expression to search for.
  • REPLACEMENT - The replacement string.
  • g - Global replacement flag. By default, sed reads the file line by line and changes only the first occurrence of the SEARCH_REGEX on a line. When the replacement flag is provided, all occurrences are replaced.
  • INPUTFILE - The name of the file on which you want to run the command.

It is a good practice to put quotes around the argument so the shell meta-characters won’t expand.

假定存在这样的一个文件内容:

file.txt

1
2
123 foo foo 
foo /bin/bash Ubuntu foobar 456

指定是否全局匹配

g标志没有加上,则只有每行第一个匹配的字符串被替换

1
sed -i 's/foo/linux/' file.txt

file.txt输出内容如下

1
2
123 Foo linux foo 
linux /bin/bash Ubuntu foobar 456

替换时是否创建备份

如果在-i后面加上后缀(比如.bak),则替换时会创建一个备份

1
sed -i.bak 's/foo/linux/' file.txt
1
2
3
4
5
6
7
8
9
[root@localhost ~]# ls -l file*
-rw-r--r-- 1 root root 49 Jan 7 14:16 file.txt
-rw-r--r-- 1 root root 49 Jan 7 14:14 file.txt.bak
[root@localhost ~]# cat file.txt
123 Foo linux foo
linux /bin/bash Ubuntu foobar 456
[root@localhost ~]# cat file.txt.bak
123 Foo foo foo
foo /bin/bash Ubuntu foobar 456

g标志加上,则每行所有匹配上的字符串都会被替换

1
sed -i 's/foo/linux/g' file.txt

Output:

1
2
123 Foo linux linux
linux /bin/bash Ubuntu linuxbar 456

可以看到"foobar"被替换为"linuxbar",如果不想要这些情况发生,需要在SEARCH_REGEX两边加上单词边界表达式(\b),这样匹配字符串时会检查字符串是不是完整的一个单词

1
sed -i 's/\bfoo\b/linux/g' file.txt

Output:

1
2
123 Foo linux linux
linux /bin/bash Ubuntu foobar 456

如果想要匹配规则对大小写不区分,可以加上I标志,以下例子加上了gI标志

1
sed -i 's/foo/linux/gI' file.txt

Output:

1
2
123 linux linux linux 
linux /bin/bash Ubuntu linuxbar 456

匹配转义字符

如果涉及匹配转义字符(比如分界符/),那么你需要在转义字符之前加上反斜线(\)进行转义。以下例子会把"/bin/bash"匹配为"/usr/bin/zsh"

1
sed -i 's/\/bin\/bash/\/usr\/bin\/zsh/g' file.txt

Delimiter character不仅可以使用’/‘,你可以使用其他分界符让匹配字符串和替换字符串可读性更强,比如’#‘,’|’

1
sed -i 's|\/bin\/bash|\/usr\/bin\/zsh|g' file.txt

Output:

1
2
123 Foo linux foo 
linux /usr/bin/zsh Ubuntu foobar 456

使用正则表达式

SEARCH_REGEX中可以使用正则表达式,比如将"456"和"123"这样的三位数字替换为"number"

1
sed -i 's/\b[0-9]\{3\}\b/number/g' file.txt

Output:

1
2
number Foo foo foo 
foo /bin/bash Ubuntu foobar number

可以使用$代表SEARCH_REGEX(匹配字符串)

1
sed -i 's/\b[0-9]\{3\}\b/{&}/g' file.txt

Output:

1
2
{123} Foo foo foo 
foo /bin/bash demo foobar {456}

该在该例中’$'代表了三位数字(其正则表达式"\b[0-9]{3}\b")

指定’-r’选项后sed命令支持处理扩展正则表达式(ERE):

1
sed -r [OTHER_OPTIONS...] '[SCRIPT...]' [INPUT_FILE]

案例

取出主机ip地址的实例

取出32位IP地址

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ ip a s ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:26:0a:2d brd ff:ff:ff:ff:ff:ff
inet 192.168.10.4/24 brd 192.168.10.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::49f:e3c8:3439:abf6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
$ ip a s ens33 | sed -n '3p'
inet 192.168.10.4/24 brd 192.168.10.255 scope global noprefixroute ens33
$ ip a s ens33 | sed -n '3p' | sed -rn 's/^.*inet (.*)/\1/gp'
192.168.10.4/24 brd 192.168.10.255 scope global noprefixroute ens33
$ ip a s ens33 | sed -n '3p' | sed -rn 's/^.*inet (.*)/\1/gp' | sed -rn 's/(.*)\/.*/\1/gp'
192.168.10.4
# 对于sed命令进行整合优化:
$ ip a s ens33 | sed -rn '3s/^.*inet (.*)\/.*/\1/gp'
192.168.10.4
# 形如's/SEARCH_REGEX/REPLACEMENT/g''/'只是作为分隔符,如果可读性差可以换成其他分隔符,比如'|'
# 后项引用前项:'\1'的内容引用了前面(.*)小括号之间的内容
$ ip a s ens33 | sed -rn '3s|^.*inet (.*)\/.*|\1|gp'
192.168.10.4

批量修改文件扩展名的实例

将"violet01.txt"~“violet10.txt"修改为"violet01.jpg”~“violet10.jpg”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ touch violet{01..10}.txt
$ ls violet??.txt
violet01.txt violet03.txt violet05.txt violet07.txt violet09.txt
violet02.txt violet04.txt violet06.txt violet08.txt violet10.txt
$ ls violet??.txt | sed -nr 's|(.*).txt|\1.txt \1.jpg|gp'
violet01.txt violet01.jpg
violet02.txt violet02.jpg
violet03.txt violet03.jpg
violet04.txt violet04.jpg
violet05.txt violet05.jpg
violet06.txt violet06.jpg
violet07.txt violet07.jpg
violet08.txt violet08.jpg
violet09.txt violet09.jpg
violet10.txt violet10.jpg
$ ls violet??.txt | sed -nr 's|(.*).txt|\1.txt \1.jpg|gp' | xargs -n2 mv
[root@localhost ~]# ll
total 32
...
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet01.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet02.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet03.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet04.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet05.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet06.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet07.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet08.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet09.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet10.jpg
...

其他方法

1
2
# '&'代表替换前的内容,该用法有些像xargs -i cp {} ~\中的'{}',有点类似后项引用前项的感觉。
$ ls violet*.txt | sed -r 's|(.*)txt|mv & \1jpg|g' | bash

对于重命名,可以使用rename命令

重命名语法格式

1
rename [options] expression replacement file...

expression - 原文件需要修改的部分

replacement - 修改的内容

file… - 输入的原文件

1
2
3
4
5
6
7
8
9
10
11
12
$ rename .txt .jpg violet*.txt
$ ll violet*.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet01.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet02.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet03.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet04.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet05.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet06.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet07.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet08.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet09.jpg
-rw-r--r-- 1 root root 0 Jan 17 15:10 violet10.jpg