The example file name is PO 2171 Cresco REVISED.pdf ..... Many of these files, the file name is not standard, the space position is not fixed. The middle space is characters ASCII code greater than 128, and I want to replace characters ASCII code greater than 128 with "_" one-time.
I haven't learned Powershell yet. Thank you very much.
CodePudding user response:
For this you are going to need regex.
Below I'm using the ASCII range 32 - 129 (in hex \x20-\x81
) to also replace any control characters:
(Get-ChildItem -Path 'X:\TheFolderWhereTheFilesAre' -File) |
Where-Object { $_.Name -match '[^\x20-\x81]' } |
Rename-Item -NewName { $_.Name -replace '[^\x20-\x81] ', '_' }
Regex Details:
[^\x20-\x81] Match a single character NOT in the range between ASCII character 0x20 (32 decimal) and ASCII character 0x81 (129 decimal)
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
CodePudding user response:
Theo's answer is effective, but there's a simpler, more direct solution, using the .NET regex Unicode code block \p{IsBasicLatin}
, which directly matches any ASCII-range Unicode character (all .NET strings are Unicode strings, internally composed of UTF-16 code units).
Its negation, \P{IsBasicLatin}
(note the uppercase P
), matches any character outside the ASCII range, so that you can use the following to replace all non-ASCII-range characters with _
, with the help of the regex-based -replace
operator:
(Get-ChildItem -File) | # Get all files in the current dir.
Rename-Item -NewName { $_.Name -replace '\P{IsBasicLatin', '_' } -WhatIf
Note: The -WhatIf
common parameter in the command above previews the operation. Remove -WhatIf
once you're sure the operation will do what you want.
Note:
Enclosing the
Get-ChildItem
call in(...)
ensures that all matching files are collected first, before renaming is performed. This prevents problems that could arise from already-renamed files re-entering the enumeration of files.Since only files (
-File
) are to be renamed, you needn't worry about file names that do not contain non-ASCII-range characters:Rename-Item
quietly ignores attempts to rename files to the name they already have.- Unfortunately, this is not true for directories, where such an attempt causes an error; this unfortunate discrepancy, present as of PowerShell 7.2.4, is the subject of GitHub issue #14903.
Strictly speaking, .NET characters (
[char]
(System.Char
) instances) are 16-bit Unicode code units (UTF-16), which can individually only represent a complete Unicode character in the so-called BMP (Basic Multilingual Plane), i.e. in the code-point range0x0
-0xFFFF
. Unicode characters beyond that range, notably emoji such as