Home > Software design >  How to remove multi-line with multi-pattern, awk pcre2grep sed
How to remove multi-line with multi-pattern, awk pcre2grep sed

Time:03-23

I have this text file

tittleofthis123
<Bunlde ver=5.0>
 <Packages>    
  <Package Type="app" FileName="Package_ARM64_beta.msix" Offset="79" Size="5791033">
   <Resources>
    rescode11
   </Resources>
   <b4:Dependencies>
     depcode12
   </b4:Dependencies>
  </Package>
  <Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
   <Resources>
    rescode21
    rescode22
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-cy.msix" Offset="579" Size="15">
   <Resources>
    rescode31
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
   <Resources>
    rescode41
   </Resources>
  </Package>
 </Packages>
</Bundle>

I need the output to be

tittleofthis123
<Bunlde ver=5.0>
 <Packages>    
  <Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
   <Resources>
    rescode21
    rescode22
   </Resources>
  </Package>
  <Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
   <Resources>
    rescode41
   </Resources>
  </Package>
 </Packages>
</Bundle>

I have tried this

pcre2grep -M -v 'ARM64.*(\n|.)*</Package>|lang-cy.*(\n|.)*</Package>' 123.txt

But off course the result isn't right, because all the package have same </Package>, so instead filtering for ARM64 only, it filter out all to the bottom Package. And I have more Package to exclude, so probably I shouldn't use -v inverse, but no idea how to retain the Title, <Bundle>, and <Packages>

tried this and this

awk '/ARM64/,/<\/Package>/ {next} {print}' 123.txt

It actually works well. But I don't understand how to make it filter more than one Package like '/ARM64/,/<\/Package>/ and /lang-cy/,/<\/Package>/. And same, I need to exclude a lot of Package, so maybe not to do the {next} thing, still have no idea how to retain Title, <Bundle>, and <Packages>

I think this is pretty close to what I need

sed -n '/<Package/{:a;N;/\n*<\/Package>/!ba; /x64/p}' 123.txt

But my very incompetence still same, don't know how to join more filter like x64 andlang-af. And same about the Title, <Bundle>, and <Packages>

Actually this is pretty much the same case, but I don't understand at the answer at all

CodePudding user response:

awk '/ARM64/,/<\/Package>/ {next} {print}' 123.txt

It actually works well. But I don't understand how to make it filter more than one Package like '/ARM64/,/<\/Package>/ and /lang-cy/,/<\/Package>/

As both ending condition are equal you might just use || (alternative) to build starting condition triggering both for ARM64 and lang-cy following way

awk '/ARM64/||/lang-cy/,/<\/Package>/ {next} {print}' 123.txt

and use || again to get another exclusion for example to remove also lang-af you might do

awk '/ARM64/||/lang-cy/||/lang-af/,/<\/Package>/ {next} {print}' 123.txt

and so on.

Warning: what you have seems to be something akin to XML, be aware that GNU AWK is best suited for using with entities which could be described using regular expression. If your, could not be described by these, as is case with XML, then you would need tool for working with Chomsky Type-2 contraptions NOT regular expression in strictest sense.

CodePudding user response:

This might work for you (GNU sed):

sed '/<Package Type/{:a;N;/<\/Package>/!ba;/_x64_\|_lang-af/!d}' file

Gather up lines between <Package Type and </Package> and do not delete the collection if it contains _x64_ or _lang-af.

  • Related