Trying to pipe a string into a grep/perl regex to pull out overlapping matches. Currently, the results only appear to pull out sequential matches without any "lookback":
Attempt using egrep (both on GNU and BSD):
$ echo "bob mary mike bill kim jim john" | egrep -io "[a-z] [a-z] "
bob mary
mike bill
kim jim
Attempt using perl style grep (-P):
$ echo "bob mary mike bill kim jim john" | grep -oP "()[a-z] [a-z] "
bob mary
mike bill
kim jim
Attempt using awk showing only the first match:
$ echo "bob mary mike bill kim jim john" | awk 'match($0, /[a-z] [a-z] /) {print substr($0, RSTART, RLENGTH)}'
bob mary
The overlapping results I'd like to see from a simple working bash pipe command are:
bob mary
mary mike
mike bill
bill kim
kim jim
jim john
Any ideas?
CodePudding user response:
Lookahead is your friend here
echo "bob mary mike bill kim jim john" |
perl -wnE'say "$1 $2" while /(\w )\s (?=(\w ))/g'
CodePudding user response:
You can also use awk
awk '{for(i=1;i<NF;i ) print $i,$(i 1)}' <<< 'bob mary mike bill kim jim john'
See the online demo. This solution iterates over all whitespace-separated fields and prints current field ($i
) field separator (a space here) the subsequent field value ($(i 1)
).
Or, another perl
solution:
perl -lane 'while (/(?=\b(\p{L} \s \p{L} ))/g) {print $1}' <<< 'bob mary mike bill kim jim john'
See the online demo. Details:
(?=
- start of a positive lookahead\b
- a word boundary(\p{L} \s \p{L} )
- capturing group 1: one or more letters, one or more whitespaces, one or more letters
)
- end of the lookahead.
Here, only Group 1 values are printed ({print $1}
).