I have two XML files which I managed to merge using a powershell comand in a batch file which creates an combined XML file with the following structure
<Supplier SupplierN="617428" ID="0002" Name1="John Doe" VAT="0123456789" />
<Supplier SupplierN="953434" ID="0002" Name1="Jane Doe" VAT="9876543210" />
<Supplier SupplierN="871007" ID="0002" Name1="Anna Smith" VAT="6355928947" />
<Supplier SupplierN="1067428" ID="0003" Name1="John Doe" VAT="0123456789" />
<Supplier SupplierN="1034" ID="0003" Name1="Jane Doe" VAT="9876543210" />
<Supplier SupplierN="60379" ID="0003" Name1="Peter Meyer" VAT="7478490345" />
Now, I would like to remove all lines with ID="0003" where the VAT is duplicated (already available as VAT in ID="0002").
Can someone provide me with assitance on how to achieve this using a batch script that I can run in the Windows task scheduler?
CodePudding user response:
@ECHO Off
SETLOCAL
rem The following settings for the source directory and filenames are names
rem that I use for testing and deliberately include names which include spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q72496621.txt"
SET "outfile=%sourcedir%\newfile.txt"
:: remove variables starting #
FOR /F "delims==" %%b In ('set # 2^>Nul') DO SET "%%b="
FOR /f "usebackqdelims=" %%b IN ("%filename1%") DO (
FOR %%e IN (id two vat) DO SET "%%e="
FOR %%e IN (%%b) DO (
IF DEFINED vat SET "#%%~e=Y"&SET "vat="
IF DEFINED two IF /i "%%~e"=="VAT" SET vat=Y
IF DEFINED id SET "id="&IF /i "%%~e"=="0002" SET two=Y
IF /i "%%~e"=="id" SET "id=Y"
)
)
(
FOR /f "usebackqdelims=" %%b IN ("%filename1%") DO (
FOR %%e IN (id three vat skipme) DO SET "%%e="
FOR %%e IN (%%b) DO (
IF DEFINED vat IF DEFINED #%%~e SET "skipme=Y"&SET "vat="
IF DEFINED three IF /i "%%~e"=="VAT" SET vat=Y
IF DEFINED id SET "id="&IF /i "%%~e"=="0003" SET three=Y
IF /i "%%~e"=="id" SET "id=Y"
)
IF NOT DEFINED skipme ECHO %%b
)
)>"%outfile%"
TYPE "%outfile%"
GOTO :EOF
Normally, I'd not tackle any problem if no attempt was shown, but I'm BORED..
Although the data sample provided has all the 0002
items appearing before the 0003
s, there is no indication that this is a normal situation, or the result of the data-generation method. Consequently, I've designed this response assuming that the data can appear in any order.
Initially, clear out all variables starting #
so that variables starting #
can be used as flags. Normally, no variables starting #
are likely to exist so the 2^>nul
suppresses the error report that no such variables were found.
Next step: read the source file to %%b
, then clear a set of flags and iterate through the tokens using %%e
. Set the flags (id, two, vat) in that order as each field is found and then set #vatnumber to "Y" on the number - but only if the line contained "id", "0002" and "vat" - in that order.
Then: repeat the recipe, this time detecting 0003
and setting the skipme
flag only if "id", "0003" and "vat" appear in the line and #vatnumber was set in the previous step.