Home > Mobile >  Retrieving $_.Extension with double file extensions
Retrieving $_.Extension with double file extensions

Time:12-16

It seems like Powershell's .Extension method doesn't recognize double extensions. As a result, something like this:

Get-ChildItem -Path:. | Rename-Item -NewName {
  '{0}{1}' -f ($_.BaseName -replace '\.', '_'), $_.Extension
}

ends up renaming file.1.tar.gz to file_1_tar.gz, while I want file_1.tar.gz.

Just an oversight? Or can it be mitigated?

CodePudding user response:

This is the intended behavior.

Here's a quote from Microsoft Doc. regarding the [System.IO.Path]::GetExtension method, which seems to share the same implementation logic than other similar functions and Powershell Cmdlets.

This method obtains the extension of path by searching path for a period (.), starting with the last character in path and continuing toward the first character. If a period is found before a DirectorySeparatorChar or AltDirectorySeparatorChar character, the returned string contains the period and the characters after it; otherwise, String.Empty is returned.

You can of course mitigate this by either creating an exception for tar.gz and other double extension by using a dictionary to define what should be treated as double exception. Here is an example of mitigation.

Get-ChildItem -Path 'C:\temp' | % {
    $BaseName = $_.BaseName
    $Extension = $_.Extension
    if ($_.FullName.EndsWith('.tar.gz')) {
        $BaseName = $_.BaseName.SubString(0,$_.BaseName.Length -4)
        # I used the full name as reference to preserve the case on the filenames
        $Extension = $_.FullName.Substring($_.FullName.Length - 7)
    }

    Rename-Item -Path $_.FullName -NewName (
        '{0}{1}' -f ($BaseName -replace '.', '_'), $Extension
    )

}

Note that since I had only .tar.gz, I did not actually use a dictionary but should you have multiple double extension type, the best way would be to apply this method through a loop against each extension to see if it match.

Take this, for instance, which loop through an array to check against multiple double extensions


# Outside of the main loop to avoid redeclaring each time.
$DoubleExtensionDict = @('.tar.gz', '.abc.def')

Get-ChildItem -Path 'C:\temp' | % {
    $BaseName = $_.BaseName
    $Extension = $_.Extension

    Foreach ($ext in $DoubleExtensionDict) {
        if ($_.FullName.EndsWith($ext)) {
            $FirstExtLength = ($ext.split('.')[1]).Length 
            $BaseName = $_.BaseName.SubString(0, $_.BaseName.Length - $FirstExtLength -1)
            $Extension = $_.FullName.Substring($_.FullName.Length - 7)
            break
        }
    }
    
     Rename-Item -Path $_.FullName -NewName (
         '{0}{1}' -f ($BaseName -replace '.', '_'), $Extension
     )

}

References

Path.GetExtension Method

CodePudding user response:

As Sage Pourpre's helpful answer notes, by design only the last .-separated token is considered the filename extension.

Short of creating a hashtable (dictionary) of known "multi-extensions", as Sage proposes - which would have to anticipate all of them - you can try a heuristic based on the following rules (since you need to rule out considering .1 part of the multi-extension):

  • Any nonempty run of .-separated tokens that do not start with a digit, at the end of a file name, is considered the multi-extension, including the . chars:
@{ Name = 'file.1.tar.gz'},
@{ Name = 'file.1.tar.zip'},
@{ Name = 'file.1.html'},
@{ Name = 'file.1.ps1'},
@{ Name = 'other.1'} | 
  ForEach-Object {
    if ($_.Name -match '(. ?)((?:\.[^\d] ))$') {
      $basename = $Matches[1]
      $exts = $Matches[2]
    } else {
      $basename = $_.Name
      $exts = ''
    }
    ($basename -replace '\.', '_')   $exts
  }

The above yields:

file_1.tar.gz
file_1.tar.zip
file_1.html
file_1.ps1
other_1
  • Related