Home > Mobile >  On converting the UFT-8 xml to Unicode in Powershell, $encoding attribute value is showing bigEndian
On converting the UFT-8 xml to Unicode in Powershell, $encoding attribute value is showing bigEndian

Time:12-23

Getting this line in output file after converting UTF-8 to Unicode

<?xml version="1.0" encoding="bigEndianUnicode"?>

But I need below line in the xml

<?xml version="1.0" encoding="UTF-16"?>

CodePudding user response:

Assuming you're working with [xml] type, you can set encoding of a XML file as follows:

[xml] $xmlData = '<example>XML</example>'

$fileName = 'C:\test.xml'

$settings = New-Object System.Xml.XmlWriterSettings

# Set encoding to UTF-16
$settings.Encoding = [System.Text.Encoding]::Unicode

$xmlWriter = [System.Xml.XmlWriter]::Create($fileName, $settings)

$xmlData.Save($xmlWriter)

$xmlWriter.Close()

CodePudding user response:

Giorgi Chakhidze's helpful answer shows a proper, XML API-based way to produce an XML file with a given encoding that is also reflected in the output file's XML declaration.

However, it sounds like you've used plain-text processing to transcode files from UTF-8 to "Unicode" (UTF-16LE), and must now adapt these files' XML declarations to match the new encoding.

The following shows a solution for a single file.xml file (it assumes that file.xml has a "Unicode" (UTF-16LE) BOM, so that Get-Content interprets its encoding correctly):

(Get-Content -Raw -LiteralPath file.xml) -replace '(?<=^.  encoding=")[^"] ', 'utf-16' |
  Set-Content -NoNewLine -Literal Path file.xml

However, it's unclear how your transcoded-from-UTF-8 files ever ended up with encoding="bigEndianUnicode" in their XML declaration.

  • Related