How to exclude awk text block that has specific condition of strings in the block-CodePudding

I am trying to exclude a text-blocks when a certain condition occurs.

The files have this layout:

- name: Sedan
  tags:
  - DIGIT
  - ABC
  - DEF
  - YES
- name: Combi
  tags:
  - DIGIT
  - ABC
  - DEF
  - NO
- nane: SUV
  tags:
  - DIGIT
  - DEF
  - YES
- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

The condition is: ABC && !DEF So, print only the text-block that will have only ABC in the block.

It should give me this printout:

- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

My first try was something like that:

awk '/^- name:/ { if (found && value) {print value} found=value="" } { value=(value?value ORS:"")$0 } /ABC/ && !/DEF/ { found=1 } END { if (found && value) { print value } }' file

But the above try prints every text-block with both patterns!

Thanks

CodePudding user response：

Using gnu-awk, you can split file into records using first - in each block:

awk -v RS='(^|\n)- ' '/- ABC/ && !/- DEF/ {printf "- %s", $0}' file

- nane: OTHER
  tags:
  - DIGIT
  - ABC
  - YES

Or to make it more precise:

awk -v RS='(^|\n)- ' '
/- ABC(\n|$)/ && !/- DEF(\n|$)/ {printf "- %s", $0}
' file

CodePudding user response：

I'm normally not a fan of multiple instances of awk/sed/grep in a pipeline, but this problems seems suited to it. First, insert blank lines as record separators. Then filter. Then remove the blank lines:

 awk '/^-/{print ""} 1' input | awk '/ABC/ && !/DEF/' RS= | sed '/^$/d'

Some versions of awk allow multi-character RS, but this pipeline seems simple enough to use with those implementations of awk that do not support that extension.

But it seems that a better solution would be to convert the yaml to json, then filter with jq, and then convert back to yaml.