Home > Mobile >  Search the set of the same character that was used
Search the set of the same character that was used

Time:03-10

I would like to determine via a search whether a character more often or less than a defined number. For example

ABC_2019_02_01_blabla_05.pdf <- right

ABC_DEF_192_1111_oaoaoa.pdf -false

For me, the decisive factor is the amount of "_" used. For example, only 5 times the character _ may have been used.

Get-ChildItem -af -recurse | Where-Object { $_.Name -notmatch '_*_*_*_*_' } | % { $_.FullName }

Don't work for that.

I would like to determine via a search whether a character more often or less than a defined number.

CodePudding user response:

You can use the .split method as a way to count the number of instance of a specific character.

$Files = Get-ChildItem -af -recurse
$Files | Where-Object {$_.Name.Split('_').Length-1 -gt 5}

Alternatively...

# Using regex... careful with special characters
$Files | Where-Object { [regex]::matches($_.Name, "_").count -gt 5}

# Grouping the char array representation of the string
$Files | Where-Object { ($_.Name.ToCharArray() | Group-Object | Where-Object Name -eq '_').Count -gt 5 }


And just for fun...

If you want both the valid items and invalid ones into a separate array, you can achieve that via the .where method, which accept an additional parameter to further define the search.

Using that sample below, invalid items (more than 5 times the _ character) will end up in the first array ($Invalid) while the valid items (the ones that were not picked up by our condition) will end up in the second array $Valid (reference)

$Invalid,$Valid = $Files.Where({$_.Name.Split('_').Length-1 -gt 5},'split')

CodePudding user response:

To complement Sage Pourpre's helpful solutions:

Two more ways to count the number of occurrences of a given character in a string:

# -> 5, 4
@{ Name = 'ABC_2019_02_01_blabla_05.pdf' },
@{ Name = 'ABC_DEF_192_1111_oaoaoa.pdf' } | 
  ForEach-Object {
    ($_.Name.ToCharArray() -eq '_').Count
  }

This relies on the -eq operator acting as a filter when the LHS is an array: in effect, it returns the subarray of characters that are _, whose element count is therefore the number of _ chars. in the input.

# -> 5, 4
@{ Name = 'ABC_2019_02_01_blabla_05.pdf' }, # Sample input
@{ Name = 'ABC_DEF_192_1111_oaoaoa.pdf' } | 
  ForEach-Object {
    ($_.Name -replace '[^_]').Length
  }

This relies on using the regex-based -replace operator for removing all characters other than _ ([^_]) from the input string, which leaves a string composed only of the _ chars., whose length is therefore the number of _ chars. in the input.


As for what you tried:

Your attempt, if corrected, has the potential to perform additional validation, such as requiring that _ characters be surrounded by at least one other character, such that, say, 'a_b_c_d_e_f.pdf' is valid, but 'abcdef_____.pdf' is not.

# -> $true, $false, $false
@{ Name = 'ABC_2019_02_01_blabla_05.pdf' }, # OK
@{ Name = 'abcdef_____.pdf' }, # Correct number of "_", but in the wrong places
@{ Name = 'ABC_DEF_192_1111_oaoaoa.pdf' } | # Not enough "_"
  ForEach-Object {
    $_.Name -match '^([^_] _){5}[^_] $'
  }

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

  • Related