I'm trying to convert ANSI and UTF-8 BOM files to UTF-8 without BOM only. I have found a code that works to do that but in my files the word "président" from ANSI file, for exemple, is converted to "prxE9sident" or "pr?sident" (problem with accident é) in UTF8.
The script powershell code that I run in my parent folder:
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"
foreach ($i in Get-ChildItem -Recurse -Force) {
if ($i.PSIsContainer) {
continue
}
$path = $i.DirectoryName -replace $source, $destination
$name = $i.Fullname -replace $source, $destination
if ( !(Test-Path $path) ) {
New-Item -Path $path -ItemType directory
}
$content = get-content $i.Fullname
if ( $content -ne $null ) {
[System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
} else {
Write-Host "No content from: $i"
}
}
Any solution to keep accents well from ANSI and other files ?
CodePudding user response:
There are actually two PowerShell Gotchas in the condition:
if ( $content -ne $null ) { ...
$Null
should be on the left hand side of the equality comparison operator- If your file is closed with a newline, the last item in the
Get-Content
results array is$Null
This might cause the concerned condition to unexpectedly evaluate to $False
and therefore your script doesn't even update the required files.
Based on the additional comments, to save you files as ANSI, you should use the Windows-1252
encoding:
[System.IO.File]::WriteAllLines($name, $content, ([System.Text.Encoding]::GetEncoding(1252)))