I have this text file
tittleofthis123
<Bunlde ver=5.0>
<Packages>
<Package Type="app" FileName="Package_ARM64_beta.msix" Offset="79" Size="5791033">
<Resources>
rescode11
</Resources>
<b4:Dependencies>
depcode12
</b4:Dependencies>
</Package>
<Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
<Resources>
rescode21
rescode22
</Resources>
</Package>
<Package Type="res" FileName="Package_lang-cy.msix" Offset="579" Size="15">
<Resources>
rescode31
</Resources>
</Package>
<Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
<Resources>
rescode41
</Resources>
</Package>
</Packages>
</Bundle>
I need the output to be
tittleofthis123
<Bunlde ver=5.0>
<Packages>
<Package Type="app" FileName="Package_x64_beta.msix" Offset="580113" Size="7195285">
<Resources>
rescode21
rescode22
</Resources>
</Package>
<Package Type="res" FileName="Package_lang-af.msix" Offset="5791" Size="1578">
<Resources>
rescode41
</Resources>
</Package>
</Packages>
</Bundle>
I have tried this
pcre2grep -M -v 'ARM64.*(\n|.)*</Package>|lang-cy.*(\n|.)*</Package>' 123.txt
But off course the result isn't right, because all the package have same </Package>
, so instead filtering for ARM64 only, it filter out all to the bottom Package. And I have more Package to exclude, so probably I shouldn't use -v
inverse, but no idea how to retain the Title, <Bundle>, and <Packages>
awk '/ARM64/,/<\/Package>/ {next} {print}' 123.txt
It actually works well. But I don't understand how to make it filter more than one Package like '/ARM64/,/<\/Package>/
and /lang-cy/,/<\/Package>/
. And same, I need to exclude a lot of Package, so maybe not to do the {next}
thing, still have no idea how to retain Title, <Bundle>, and <Packages>
I think this is pretty close to what I need
sed -n '/<Package/{:a;N;/\n*<\/Package>/!ba; /x64/p}' 123.txt
But my very incompetence still same, don't know how to join more filter like x64
andlang-af
. And same about the Title, <Bundle>, and <Packages>
Actually this is pretty much the same case, but I don't understand at the answer at all
CodePudding user response:
awk '/ARM64/,/<\/Package>/ {next} {print}' 123.txt
It actually works well. But I don't understand how to make it filter more than one Package like
'/ARM64/,/<\/Package>/
and/lang-cy/,/<\/Package>/
As both ending condition are equal you might just use ||
(alternative) to build starting condition triggering both for ARM64
and lang-cy
following way
awk '/ARM64/||/lang-cy/,/<\/Package>/ {next} {print}' 123.txt
and use ||
again to get another exclusion for example to remove also lang-af
you might do
awk '/ARM64/||/lang-cy/||/lang-af/,/<\/Package>/ {next} {print}' 123.txt
and so on.
Warning: what you have seems to be something akin to XML, be aware that GNU AWK
is best suited for using with entities which could be described using regular expression. If your, could not be described by these, as is case with XML, then you would need tool for working with Chomsky Type-2 contraptions NOT regular expression in strictest sense.
CodePudding user response:
This might work for you (GNU sed):
sed '/<Package Type/{:a;N;/<\/Package>/!ba;/_x64_\|_lang-af/!d}' file
Gather up lines between <Package Type
and </Package>
and do not delete the collection if it contains _x64_
or _lang-af
.