Home > Blockchain >  Delete/Remove duplicate lines in XML but prioritise all entries with certain ID value
Delete/Remove duplicate lines in XML but prioritise all entries with certain ID value

Time:06-04

I have two XML files which I managed to merge using a powershell comand in a batch file which creates an combined XML file with the following structure

<Supplier SupplierN="617428" ID="0002" Name1="John Doe" VAT="0123456789" />
<Supplier SupplierN="953434" ID="0002" Name1="Jane Doe" VAT="9876543210" />
<Supplier SupplierN="871007" ID="0002" Name1="Anna Smith" VAT="6355928947" />
<Supplier SupplierN="1067428" ID="0003" Name1="John Doe" VAT="0123456789" />
<Supplier SupplierN="1034" ID="0003" Name1="Jane Doe" VAT="9876543210" />
<Supplier SupplierN="60379" ID="0003" Name1="Peter Meyer" VAT="7478490345" />

Now, I would like to remove all lines with ID="0003" where the VAT is duplicated (already available as VAT in ID="0002").

Can someone provide me with assitance on how to achieve this using a batch script that I can run in the Windows task scheduler?

CodePudding user response:

@ECHO Off
SETLOCAL
rem The following settings for the source directory and filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.

SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q72496621.txt"
SET "outfile=%sourcedir%\newfile.txt"
:: remove variables starting #
FOR  /F "delims==" %%b In ('set # 2^>Nul') DO SET "%%b="

FOR /f "usebackqdelims=" %%b IN ("%filename1%") DO (
 FOR %%e IN (id two vat) DO SET "%%e="
 FOR %%e IN (%%b) DO (
  IF DEFINED vat SET "#%%~e=Y"&SET "vat="
  IF DEFINED two IF /i "%%~e"=="VAT" SET vat=Y
  IF DEFINED id SET "id="&IF /i "%%~e"=="0002" SET two=Y
  IF /i "%%~e"=="id" SET "id=Y"
 )
)
(
FOR /f "usebackqdelims=" %%b IN ("%filename1%") DO (
 FOR %%e IN (id three vat skipme) DO SET "%%e="
 FOR %%e IN (%%b) DO (
  IF DEFINED vat IF DEFINED #%%~e SET "skipme=Y"&SET "vat="
  IF DEFINED three IF /i "%%~e"=="VAT" SET vat=Y
  IF DEFINED id SET "id="&IF /i "%%~e"=="0003" SET three=Y
  IF /i "%%~e"=="id" SET "id=Y"
 )
 IF NOT DEFINED skipme ECHO %%b
)
)>"%outfile%"

TYPE "%outfile%"

GOTO :EOF

Normally, I'd not tackle any problem if no attempt was shown, but I'm BORED..

Although the data sample provided has all the 0002 items appearing before the 0003s, there is no indication that this is a normal situation, or the result of the data-generation method. Consequently, I've designed this response assuming that the data can appear in any order.

Initially, clear out all variables starting # so that variables starting # can be used as flags. Normally, no variables starting # are likely to exist so the 2^>nul suppresses the error report that no such variables were found.

Next step: read the source file to %%b, then clear a set of flags and iterate through the tokens using %%e. Set the flags (id, two, vat) in that order as each field is found and then set #vatnumber to "Y" on the number - but only if the line contained "id", "0002" and "vat" - in that order.

Then: repeat the recipe, this time detecting 0003 and setting the skipme flag only if "id", "0003" and "vat" appear in the line and #vatnumber was set in the previous step.

  • Related