Home > Software engineering >  In Bash, I want to get the Nth word of a string after a matching pattern with awk or grep
In Bash, I want to get the Nth word of a string after a matching pattern with awk or grep

Time:09-22

In Bash, I want to get the Nth word of a string after a matching pattern with awk.

Example text:

hadf asdfi daf PATTERN asdf dsjk PRINT_THIS asdf adas
asdf sdf PATTERN asdf dasdf PRINT_THIS ads asdf PATTERN ads da PRINT_THIS
ads PATTERN ads da PRINT_THIS

Excepted output:

PRINT_THIS
PRINT_THIS
PRINT_THIS
PRINT_THIS

So if a pattern is found, the second word after the match should be output.

How can i do this?

CodePudding user response:

With GNU grep:

grep -oP '.*?\bPATTERN(?:\h \H ){2}\h \K\S ' file

Perl:

perl -lnE 'while (/.*?\bPATTERN(?:\h \H ){2}\h (\S )/g) { say $1; }' file

Demo and explanation of regex

Or with awk:

awk '/PATTERN[[:blank:]]/{for(i=1;i<=NF-3;i  ) if ($i ~ /^PATTERN$/) print $(i 3)}' file

All print:

PRINT_THIS
PRINT_THIS
PRINT_THIS
PRINT_THIS

CodePudding user response:

So, should it be in Bash or with awk or grep? In Bash you can do the following:

while read -ra tokens; do
  for idx in "${!tokens[@]}"; do
    [[ "${tokens[idx]}" = 'PATTERN' ]] && printf '%s\n' "${tokens[idx   3]}"
  done
done

In case the tokens between PATTERN and PRINT_THIS cannot contain another PATTERN, you could make it a bit more wannabe-efficient (and uglier), like this:

while read -ra tokens; do
  for ((idx = 0; idx < ${#tokens[@]};   idx)); do
    [[ "${tokens[idx]}" = 'PATTERN' ]] && printf '%s\n' "${tokens[idx  = 3]}"
  done
done

Notice the = instead of , as in “making loops hard to read 101”.

Last but not least, declare -i idx step would make it (even) a tiny bit more efficient.

  • Related