Home > Mobile >  Split and regex match with Powershell
Split and regex match with Powershell

Time:12-21

Say I have a filename string, something like:

test_ABC_19000101_010101.987.txt,

Where "test" could be any combination of white space, characters, numbers, etc. I wish to extract the 19000101_010101 part (date and time) with Powershell. Currently I am assigning -split "_ABC_" to a variable and taking the second element of the array. I am then splitting this string subsequent times. Is there a way to accomplish this in one go?

PS

"_ABC_" is constant, occurring unchanged in all instances of filename(s).

CodePudding user response:

A more concise - albeit perhaps more obscure - alternative to Santiago Squarzon's helpful answer:

# Construct a regex that consumes the entire file name while
# using capture groups for the parts of interest.
$re = '. _ABC_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.(\d{3})\.. '

[datetime] (
  # In the replacement string, use $1, $2, ... to refer to what the
  # first, second, ... capture group captured.
  'test_ABC_19000101_010101.987.txt' -replace $re, '$1-$2-$3T$4:$5:$6.$7'
)

Output:

Monday, January 1, 1900 1:01:01 AM

The -replace operation results in string '1900-01-01T01:01:01.987', which is a (culture-invariant) format that you can use as-is with a [datetime] cast.

Note that with a Get-ChildItem call as input you could slightly simplify the regex by providing $_.BaseName rather than $_.Name as the -replace LHS, which obviates the need to also match the extension (.\. ) in the regex.

CodePudding user response:

This regex seems an overkill but I think it should work, as long as _ABC_ is constant and there is a _ to separate the date from the time and a . to separate time from milliseconds:

$re = [regex]'(?<=_ABC_)(?<date>\d*)_(?<time>\d*)\.(?<millisec>\d*)(?=\.)'

@'
test_ABC_19000101_010101.987.txt
t' az@ 0est_ABC_20000101_090101.123.txt
tes8as712t_ABC_21000101_080101.456.txt
te098d $st_ABC_22000101_070101.789.txt
[test]_ABC_23000101_060101.101.txt
t?\est_ABC_24000101_050101.112.txt
'@ -split '\r?\n' | ForEach-Object {

    $groups = $re.Match($_).Groups
    $date = $groups['date']
    $time = $groups['time']
    $msec = $groups['millisec']

    [datetime]::ParseExact(
        "$date $time $msec",
        "yyyyMMdd HHmmss fff",
        [cultureinfo]::InvariantCulture
    )
}

See https://regex101.com/r/8oSpqf/1 for details.

CodePudding user response:

If there will never be multiple sequences in the filename that appear as the timestamp (8 digits, _, 6 digits, then you could match on that pattern of digits.

PS C:\> 'test_ABC_19000101_010101.987.txt' -match '^.*ABC_(\d{8}_\d{6})\..*'
True
PS C:\> $Matches

Name                           Value
----                           -----
1                              19000101_010101
0                              test_ABC_19000101_010101.987.txt

PS C:\> $Matches[1]
19000101_010101

You would use the filename instead of the explicit string.

If you want to get a [System.DateTime] from it:

PS C:\> [datetime]::ParseExact($Matches[1], 'yyyyMMdd_HHmmss', $null)

Monday, January 1, 1900 01:01:01
  • Related