I want to extract all strings matching a regex in a text. I come up with the following code in awk.
I am wondering if there is any more efficient and more concise way to capture strings matching a regex in a text.
[0-9]
is just a test, it can be an arbitrary regex. The input string can also be an arbitrary text. The solution can be any command line solution in linux, other tools like sed, can also be used, as I am looking for a superior solution than the current one. Language like python, bash, is also fine, as long as the code is not too long and can be taken inline.
awk -v regex='[0-9]' -e '{
line = $0
while(match(line, regex)) {
res[substr(line, RSTART, RLENGTH)] = ""
line = substr(line, RSTART RLENGTH)
}
}
END {
for(k in res) print k
}' <<< 'a1b2c3'
CodePudding user response:
With GNU awk
and its FPAT variable/feature:
$ awk -v regex='[0-9]' 'BEGIN {FPAT=regex} {for (i=1;i<=NF;i ) print $i}' <<< 'a1b2c3'
1
2
3
FPAT=...
defines the actual field(2) (as opposed to FS
which defines the field delimiter).
CodePudding user response:
a python solution:-
python -c "import re, sys; print('\n'.join(re.compile(sys.argv[1]).findall(sys.argv[2])))" "__regx_pattern__" "__text__"
example: python -c "import re, sys; print('\n'.join(re.compile(sys.argv[1]).findall(sys.argv[2])))" "hi_[0-9]*" "hi_1 hi_2"
output:
hi_1
hi_2
CodePudding user response:
If all you need to do is print every match, then grep should be enough.
$ echo a1b2c3 | rg '[0-9]' --only-matching
1
2
3
rg
is ripgrep, an "improved" version of standard grep. You can do the same task with grep as well, but I find rg's syntax to be more ergonomic than grep.
CodePudding user response:
How about a perl
solution:
perl -lne 'print for /(\d)/g' <<< 'a1b2c3'
-ne
option is mostly equivalent to that ofsed
.-l
option appends a newline to each output ofprint
.print for /(regex)/g
constracts a loop to print all matched substrings.
CodePudding user response:
It sounds like grep -o regexp file
is all you need.
e.g.:
$ grep -o '[0-9]' <<< 'a1b2c3'
1
2
3