Home > database >  How to exclude awk text block that has specific condition of strings in the block
How to exclude awk text block that has specific condition of strings in the block

Time:09-28

I am trying to exclude a text-blocks when a certain condition occurs.

The files have this layout:

- name: Sedan
  tags:
  - DIGIT
  - ABC
  - DEF
  - YES
- name: Combi
  tags:
  - DIGIT
  - ABC
  - DEF
  - NO
- nane: SUV
  tags:
  - DIGIT
  - DEF
  - YES
- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

The condition is: ABC && !DEF So, print only the text-block that will have only ABC in the block.

It should give me this printout:

- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

My first try was something like that:

awk '/^- name:/ { if (found && value) {print value} found=value="" } { value=(value?value ORS:"")$0 } /ABC/ && !/DEF/ { found=1 } END { if (found && value) { print value } }' file

But the above try prints every text-block with both patterns!

Thanks

CodePudding user response:

Using gnu-awk, you can split file into records using first - in each block:

awk -v RS='(^|\n)- ' '/- ABC/ && !/- DEF/ {printf "- %s", $0}' file

- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

Or to make it more precise:

awk -v RS='(^|\n)- ' '
/- ABC(\n|$)/ && !/- DEF(\n|$)/ {printf "- %s", $0}
' file

CodePudding user response:

I'm normally not a fan of multiple instances of awk/sed/grep in a pipeline, but this problems seems suited to it. First, insert blank lines as record separators. Then filter. Then remove the blank lines:

 awk '/^-/{print ""} 1' input | awk '/ABC/ && !/DEF/' RS= | sed '/^$/d'

Some versions of awk allow multi-character RS, but this pipeline seems simple enough to use with those implementations of awk that do not support that extension.

But it seems that a better solution would be to convert the yaml to json, then filter with jq, and then convert back to yaml.

  • Related