re.sub参数顺序的问题

提出问题

在写re.sub或者re.subn的时候, 常常会不太确定引用参数的顺序, 需要中断的时间查看提示或者help文档.
比如input_string='trade war修改为trade negotiation

1
2
In [17]: re.sub("war", "negotiation", "trade war")                                                                            
Out[17]: 'trade negotiation'
1
2
3
sub(pattern, repl, string, count=0, flags=0) 
def sub(pattern, repl, string, count=0, flags=0):
return _compile(pattern, flags).sub(repl, string, count)#首先处理regex-pattern

分析问题

pattern是与source(input_string)的匹配的内容, repl是修改后的内容(destination), 这里顺序与str.replace是一致.

1
replace(self, old, new, count=-1, /)

old 来自source, new是输出到destination结果中.

1
2
In [16]: "trade war".replace("war", "negotiation")                                                                            
Out[16]: 'trade negotiation'

sed也遵循同样的模式.

1
2
3
4
5
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with
replacement. The replacement may contain the special character & to refer to that portion of the pattern
space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-
expressions in the regexp.

regex-pattern匹配 source 数据中的内容, replacement则是替换后输出到destination结果中.

1
2
$ echo 'trade war' | sed "s/war/negotiation/g"
trade negotiation

其他的Text Processing

1
2
3
4
$ echo "trade-war" | tr "-" "\n"
trade
war
tr [OPTION]... SET1 [SET2]

SET1 is from the source, SET2 is the result of the destination after been processed.

总结这种模式和思维惯例:

1
function source destination

Text Processing如此,

File Handling的utilities遵循同样的模式.

1
2
3
4
5
6
mv [OPTION]... SOURCE... DIRECTORY
cp [OPTION]... [-T] SOURCE DEST
ln [OPTION]... Source... DIRECTORY
rsync [OPTION...] SRC... [DEST]
scp SRC... DEST
dd if=/dev/{{source_drive}} of=/dev/{{dest_drive}}

例外的情况是tar.

1
2
3
tar -c [-f ARCHIVE] [OPTIONS] [FILE...]
tar -cvf dest_archive.tar.gz source_dir
tart -cvf backup.tar /home/me/

tar是将目标放在前面.

再回头看 re.sub

1
2
3
re.sub(pattern, repl, string) 
#扩展后
re.sub(pattern_from_source, replacement_to_result, source_data)

三个参数中pattern_from_source, replacement_to_result , source_data 的最后一个是source_data, 将source放置在最后.
grepsed都遵循同样的模式

1
2
sed 's/{{regex}}/{{replace}}/' {{filename}}
grep [OPTIONS] -e PATTERN ... [FILE...] #grep regex source

例外的情况是find

1
2
find [-H] [-L] [-P] [-D debugopts] [-Olevel] [starting-point...] [expression]
find [Option] source pattern

总结:

Data Stream Processing和File Handling遵循subroutine src dst 模式.两个例外的情况是tar and find

这个问题之所以值得探讨,是因为涉及底层的方法论和工作模式.