$lower = Import-Csv "C:\\Users\\X\\Desktop\\U\\cvv.csv"
$lower | ForEach-Object {
src['A']=src['A'].str[:20].str.lower()
}
$lower |
Export-Csv -Path "C:\\Users\\X\\Desktop\\U\\cvv2.csv"
I tried this method, but it does not work.
I want if is over 20 charters to delete and match is to maximum 20
CodePudding user response:
It looks like you're mixing Python and PowerShell syntax.
You're probably looking for this:
$lower = Import-Csv 'C:\Users\X\Desktop\U\cvv.csv'
$lower | ForEach-Object {
$_.A = $_.A.Substring(0, 20).ToLower()
}
However, if there's a chance that some property values have fewer than 20 characters, more work is needed, namely to avoid the exception that the .Substring()
method would otherwise throw.
$lower = Import-Csv 'C:\Users\X\Desktop\U\cvv.csv'
$lower | ForEach-Object {
$val = if ($_.A.Length -gt 20) { $_.A.Substring(0, 20) } else { $_.A }
$_.A = $val.ToLower()
}
The following is a shorter alternative, but will perform poorly if many of the input strings are shorter than 20 characters, because exception handling is expensive in terms of performance:
try { $_.A.Substring(0, 20) } catch { $_.A }
In PowerShell (Core) 7 , you can shorten the
if
statement to:$_.A.Length -gt 20 ? $_.A.Substring(0, 20) : $_.A
Optional reading: comparing the performance of various substring-extraction approaches.
There are several approaches to extracting substrings in PowerShell, and they vary widely with respect to verbosity and performance:
The two aspects aren't related, however, and, in fact, the most verbose approach is fastest in this case.
Broadly speaking, the approaches can be classified as:
- Use of the .NET
.Substring()
method - Use of array slicing, i.e. treating a string as a character array to extract a sub-array from and
- Use of regex operations via the
-replace
operator
- Use of the .NET
Below are the results of benchmarks, which give a rough sense of relative performance:
Performance measurements in PowerShell aren't an exact science, and the results depend on many factors - not least the host hardware; the benchmarks below average 50 runs to get a better sense, and it is the relative performance, reflected in the
Factor
column that is of interest (1.00
reflecting the fastest time, all other values being multiples of that).Substring extraction of (up to) 20 chars. is performed on
1,000
strings, half of which are longer than that, half of which are shorter.
Important: The benchmarks juxtapose conditional solutions for
.Substring()
calls with unconditional-replace
and array-slicing solutions, which skews the results - to compare true substring-extraction performance, the latter two approaches need to be modified to use conditionals too.- The reason for using conditional processing only for the
.Substring()
approach is that it is a necessity there - in order to avoid costly exceptions - whereas the appeal of the other approaches is concision, i.e. not having to use conditionals.
- The reason for using conditional processing only for the
Benchmark results:
- Results from running in Windows PowerShell v5.1 on a Windows 10 machine:
Factor Secs (50-run avg.) Command TimeSpan
------ ------------------ ------- --------
1.00 0.001 # .Substring if... 00:00:00.0005147
3.26 0.002 # -replace capture group... 00:00:00.0016768
9.14 0.005 # -replace lookbehind... 00:00:00.0047054
179.29 0.092 # .Substring try... 00:00:00.0922835
230.68 0.119 # array slicing [string]::new()... 00:00:00.1187340
268.38 0.138 # array slicing -join ... 00:00:00.1381349
- Results from running in PowerShell (Core) 7.3.0-preview.3 on the same Windows 10 machine:
Factor Secs (50-run avg.) Command TimeSpan
------ ------------------ ------- --------
1.00 0.000 # .Substring if… 00:00:00.0004693
1.40 0.001 # .Substring ternary conditional… 00:00:00.0006553
3.90 0.002 # -replace capture group… 00:00:00.0018304
7.10 0.003 # -replace lookbehind… 00:00:00.0033339
184.87 0.087 # .Substring try… 00:00:00.0867657
253.87 0.119 # array slicing [string]::new()… 00:00:00.1191510
267.48 0.126 # array slicing -join … 00:00:00.1255387
- Summary:
- The
.Substring()
-based approaches are by far the fastest - except if combined withtry
/catch
(exception handling is expensive).- Surprisingly, the ternary conditional (
? :
) turned out to be a bit slower than the equivalentif
statement.
- Surprisingly, the ternary conditional (
- The
-replace
-based ones are slower by a factor of 3-4 with the capture-group variant, with the variant that uses a look-behind assertion being about twice as slow as that. - By far the slowest are the array-slicing approaches, by two orders of magnitude.
- The
Benchmark source code:
To run these benchmarks yourself, you must download function
Time-Command
from this Gist.Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
# Create 1000 strings, half of which longer than 20 chars., and half shorter.
$strs = , ('x' * 30) * 500 , ('y' * 10) * 500
# Construct an array of script blocks with the various
# substring-extraction methods.
$cmds =
{ # -replace capture group
foreach ($s in $strs) {
$s -replace '^(.{20}). ', '$1'
}
},
{ # -replace lookbehind
foreach ($s in $strs) {
$s -replace '(?<=^.{20}). '
}
},
{ # .Substring try
foreach ($s in $strs) {
try { $s.Substring(0, 20) } catch { $_}
}
},
{ # .Substring if
foreach ($s in $strs) {
if ($s.Length -gt 20) { $s.Substring(0, 20) } else { $s }
}
},
{ # array slicing -join
foreach ($s in $strs) {
-join $s[0..19]
}
},
{ # array slicing [string]::new()
foreach ($s in $strs) {
[string]::new($s[0..19])
}
}
# PowerShell (Core): add variant with ternary conditional.
if ($IsCoreClr) {
# Note: The script block must be constructed *as a string*,
# to avoid breaking the parsing stage of the script in Windows PowerShell.
$cmds = [scriptblock]::Create(@'
# .Substring ternary conditional
foreach ($s in $strs) {
$s.Length -gt 20 ? $s.Substring(0, 20) : $s
}
'@)
}
# Compare the performance of various substring extraction methods,
# averaged over 50 runs.
Time-Command -Count 50 $cmds
CodePudding user response:
I would personally use the index operator [ ]
in combination with the range operator ..
:
Import-Csv "C:\\Users\\X\\Desktop\\U\\cvv.csv" | ForEach-Object {
$_.A = [string]::new($_.A[0..19]).ToLower() # Update the the `A` value
$_ # Output the object
} | Export-Csv -Path "C:\\Users\\X\\Desktop\\U\\cvv2.csv"
It would handle strings that are below or above the desired Length:
PS /> 'HELLO WORLD', 'ONLY 20 CHARS LENGTH ALLOWED' | ForEach-Object {
[string]::new($_[0..19]).ToLower()
}
hello world
only 20 chars length