In Bash, I want to get the Nth word of a string after a matching pattern with awk.
Example text:
hadf asdfi daf PATTERN asdf dsjk PRINT_THIS asdf adas
asdf sdf PATTERN asdf dasdf PRINT_THIS ads asdf PATTERN ads da PRINT_THIS
ads PATTERN ads da PRINT_THIS
Excepted output:
PRINT_THIS
PRINT_THIS
PRINT_THIS
PRINT_THIS
So if a pattern is found, the second word after the match should be output.
How can i do this?
CodePudding user response:
With GNU grep:
grep -oP '.*?\bPATTERN(?:\h \H ){2}\h \K\S ' file
Perl:
perl -lnE 'while (/.*?\bPATTERN(?:\h \H ){2}\h (\S )/g) { say $1; }' file
Or with awk:
awk '/PATTERN[[:blank:]]/{for(i=1;i<=NF-3;i ) if ($i ~ /^PATTERN$/) print $(i 3)}' file
All print:
PRINT_THIS
PRINT_THIS
PRINT_THIS
PRINT_THIS
CodePudding user response:
So, should it be in Bash or with awk
or grep
? In Bash you can do the following:
while read -ra tokens; do
for idx in "${!tokens[@]}"; do
[[ "${tokens[idx]}" = 'PATTERN' ]] && printf '%s\n' "${tokens[idx 3]}"
done
done
In case the tokens between PATTERN
and PRINT_THIS
cannot contain another PATTERN
, you could make it a bit more wannabe-efficient (and uglier), like this:
while read -ra tokens; do
for ((idx = 0; idx < ${#tokens[@]}; idx)); do
[[ "${tokens[idx]}" = 'PATTERN' ]] && printf '%s\n' "${tokens[idx = 3]}"
done
done
Notice the =
instead of
, as in “making loops hard to read 101”.
Last but not least, declare -i idx step
would make it (even) a tiny bit more efficient.