Home > OS >  Powershell Script to replace accented letters in multiple files not working
Powershell Script to replace accented letters in multiple files not working

Time:10-26

I'm trying to replace ALL accented letters and some strings in multiple files located in one folder. The strings replacement is working but not the accented letters

I've multiple files located in "C:\\FilePath"

I've created a Batch file with the following code:

@echo off
Powershell.exe -executionpolicy remotesigned -File  C:\Users\User\Desktop\IFCParser.ps1
pause

And IFCParser.ps1 contains all the following lines, one after the other:

Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCBuilding') {(Get-Content $_ | ForEach {$_ -replace 'IFCBuilding', 'IFCBuildingElementProxy'}) | Set-Content $_  }}
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCAnotherWord') {(Get-Content $_ | ForEach {$_ -replace 'IFCAnotherWord', 'IFCBuildingElementProxy'}) | Set-Content $_  }}

The above code DOES the job when I run the bat file, but I can't get the following part to work:

Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'á', 'a'}) | Set-Content $_  }}
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'é' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'é', 'e'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'í', 'i'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ó', 'o'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ú', 'u'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Á', 'A'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'É' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'É', 'E'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Í', 'I'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ó', 'O'}) | Set-Content $_  }} 
Get-ChildItem  -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ú', 'U'}) | Set-Content $_  }}

I'm testing this on a file like this:

áéíóúÁÉÍÓÚÑñáéíóúÁ

ÉÍÓÚÑñáéíóúÁÉÍÓÚÑñá

éíóúÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñáéíó

úÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñ

And it stays the same, no accents removed. I think that I've something wrong with the encoding, I've run this with the parameter just in the first GetContent, only on the second one, and with no -Encoding at all.

By the way, I'm sure that there are more effective ways of doing this, but I'm just starting with this here and not finding one that works.

CodePudding user response:

Running a single line of code on a single file like this works as expected:

Get-ChildItem  -Path C:\temp\testdata.txt | ForEach-Object {
    If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {
        (Get-Content $_ -Encoding UTF8 | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_  }
}

Given this, your code must be failing in the file recursion or in the execution process.

Run the script in an editor before trying to run as a batch and try adding error trapping. You can also add some logging to track down what's happening when running as batch:

Start-Transcript -Path 'c:\temp\outputlog.txt'
Try {
    Get-ChildItem  -Path C:\temp\testdata.txt -recurse -ErrorAction Stop | ForEach-Object {
        Write-Host "Processing $_"
        If (Get-Content $_.FullName -Encoding UTF8 -ErrorAction Stop | Select-String 'á' -AllMatches) {
            Write-Host "Found match for á, replacing...."
            (Get-Content $_ -Encoding UTF8 -ErrorAction Stop | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_ -ErrorAction Stop }
    }
}
Catch {
    $_
    Stop-Transcript
}
Stop-Transcript

CodePudding user response:

As for replacing the contents of the files in your folder, you should be able to do that using just one Get-ChildItem call.

Put this helper function on top of your script; it is used for replacing all the accented letters in the files:

function Replace-Diacritics {
    Param(
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [string] $Text
    )
    ($Text.Normalize( [Text.NormalizationForm]::FormD ).ToCharArray() |
     Where-Object {[Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne
                   [Globalization.UnicodeCategory]::NonSpacingMark }) -join ''
}

Now the rest of the code simplified:

Get-ChildItem -Path 'C:\FilePath\*.*' -File -Recurse | ForEach-Object {
    $content = Get-Content -Path $_.FullName -Raw -Encoding UTF8 | Replace-Diacritics
    $content -replace '\b(IFCBuilding|IFCAnotherWord)\b', 'IFCBuildingElementProxy' | Set-Content -Path $_.FullName -Encoding UTF8
}

Using your example file, the new content after calling `Replace-Diacritics``will be:

aeiouAEIOUNnaeiouA

EIOUNnaeiouAEIOUNna

eiouAEIOUNnaeiouAEIOUNnaeio

uAEIOUNnaeiouAEIOUNn

Operator -replace uses regex. The pattern '\b(IFCBuilding|IFCAnotherWord)\b' means to find he words 'IFCBuilding' OR 'IFCAnotherWord' as whole words (\b is a Word Boundary) and replace these with 'IFCBuildingElementProxy'.
If you also need this to be case-sensitive, use -creplace instead of -replace

  • Related