Home > OS >  Why is the XML created using powershell scripting not in the right format?
Why is the XML created using powershell scripting not in the right format?

Time:12-28

I'm executing a PS script to read the contents of an xml, update few tag values and store contents into multiple xml files. I'm able to achieve all this but the xml files created are not getting read properly by the messaging queue to which it is passed. BUT the same xml file works in the queue when I open it and click save without making any changes to the data. I compared the 2 files 1 - after it is created and 2 - after I open the same and click save and they are identical! I cannot for the life of me figure out what is going wrong and how to fix it.

How to create an output xml file in a readable format? Not sure what changes when I click 'Save' on the xml files. Please help.

input CASH.XML:

<?xml version="1.0" encoding="UTF-8"?>
<ns:POSTransaction xmlns:ns="http://schema.xyz.com/Commerce/Customer/Transaction/v1">
<ns:tranHeader>
<ns:transactionId>96846836238236142669</ns:transactionId>
<ns:businessDateTime>2021-12-25T01:10:00</ns:businessDateTime>
<ns:emailId>[email protected]</ns:emailId>
</ns:tranHeader>
</ns:POSTransaction>

PS:

$log="H:\logs.txt"
[xml]$loadXML = Get-Content "H:\Q_This\CASH.XML"

try
{
   $tranID = $loadXML.POSTransaction.tranHeader.transactionId.substring(17,3)
   $tranIntID = [int]$tranID   
   $tranc = $loadXML.POSTransaction.tranHeader.transactionId.substring(0,17)    
   $uname = $loadXML.POSTransaction.tranHeader.emailId.substring(0,11)
   $mailcnt = [int]$loadXML.POSTransaction.tranHeader.emailId.substring(11,3)
   $mailend = $loadXML.POSTransaction.tranHeader.emailId.Split("@")[1]

   for ($mailcnt; $mailcnt -lt 10; $mailcnt  )
   {    
        for ([int]$i =1; $i -le 5; $i  )
        {
        $mailupd = ([string]($mailcnt 1)).PadLeft(3,'0')
        $tranIntID = $tranIntID 1
        $loadXML.POSTransaction.tranHeader.transactionId = $tranc [string]$tranIntID
        $loadXML.POSTransaction.tranHeader.emailId = $uname $mailupd '@' $mailend
        $fileName = "CASH_" $tranIntID "_" $mailupd ".XML"
        $loadXML.Save("H:\Q_This\" $fileName)
        }
   }
}
catch
{
    Write-Host $_.Exception.Message
    Add-content $log -value ([string](Get-Date)   ' '  $_.Exception.Message)    
}

The above code created 40 output xml files: 5 transaction files for each emailID from Performancetest003-010@ymail.com. However none of it was recognised by the messaging queue until I opened and clicked save (with no data change).

CodePudding user response:

XML APIs have support for character encoding bult in, and if a given XML document's declaration specifies an encoding explicitly in its XML declaration (e.g. <?xml version="1.0" encoding="utf-8"?> ), that encoding is respected both on reading from and writing to files.

Therefore, the robust way to read and write XML files is to use a dedicated XML API - the [xml] (System.Xml.XmlDocument) type's .Load() and .Save() methods in this case - rather than plain-text processing cmdlets such as Get-Content and Set-Content / Out-File.

Caveat:

  • As of .NET 6.0 / PowerShell 7.2, the .Save() method unexpectedly saves an XML document with an explicit encoding attribute of "utf-8" to a UTF-8 file with a BOM (byte-order mark), which causes problems for some XML consumers (even though it shouldn't). The workaround is to remove the expiicit encoding attribute (set it to $null); see this answer for details.

Your later feedback indicates that you're looking for ANSI-encoded output XML files, i.e. that your goal is to transcode the input XML from UTF-8 to ANSI.

The following is a simplified, self-contained example of such transcoding. It assumes that your system's active ANSI code page is Windows-1252.

# In- and output files.
# IMPORTANT:
#   Always use *full, file-system-native paths* when calling .NET methods.
$inFile =   Join-Path $PWD.ProviderPath in.xml
$outFile =  Join-Path $PWD.ProviderPath out.xml

# Create a UTF-8-encoded sample input file,
# for simplicity with plain-text processing.
# Note the non-ASCII character in the element text ('ä')
'<?xml version="1.0" encoding="utf-8"?><foo>bär</foo>' | Set-Content -Encoding utf8 $inFile

# Read the file using the XML-processing API provided via the [xml] type.
$xml = [xml]::new()
$xml.Load($inFile)

# Now change the character-encoding attribute to the desired new encoding.
# An XML declaration - if present - is always the *first child node* 
# of the [xml] instance.
$xml.ChildNodes[0].encoding = 'windows-1252'

# Save the document.
# The .Save() method will automatically respect the specified encoding.
$xml.Save($outFile)

To verify that the output file was correctly Windows-1252-encoded, use the following command:

  • PowerShell (Core) 7
# PowerShell (Core) defaults to UTF-8 in the absence of a BOM.
Get-Content -Encoding 1252 $outFile
  • Windows PowerShell
# Windows PowerShell *defaults* to the 
# system's active ANSI code page in the absence of a BOM.
Get-Content $outFile

You should see the following output - note the correct rendering of the non-ASCII character, ä:

<?xml version="1.0" encoding="windows-1252"?>
<foo>bär</foo>

Note:

  • Do not try to perform transcoding via plain-text processing, such as using a combination of Get-Content and Set-Content, because, with an explicit encoding attibute in the input XML you'll create self-contradictory XML files; that is, the encoding that the document claims to have in its XML declaration then won't match the actual encoding. This may not always matter (if the consumer too performs plain-text processing instead of proper XML parsing), but should be avoided for conceptual clarity alone.
  • Related