Home > OS >  powershell | csv file edit each line in Colum A to set maximum characters
powershell | csv file edit each line in Colum A to set maximum characters

Time:04-06

$lower = Import-Csv "C:\\Users\\X\\Desktop\\U\\cvv.csv"  
$lower | ForEach-Object {

       src['A']=src['A'].str[:20].str.lower()
    
     } 

$lower |
Export-Csv -Path "C:\\Users\\X\\Desktop\\U\\cvv2.csv"

I tried this method, but it does not work.

I want if is over 20 charters to delete and match is to maximum 20

CodePudding user response:

It looks like you're mixing Python and PowerShell syntax.

You're probably looking for this:

$lower = Import-Csv 'C:\Users\X\Desktop\U\cvv.csv'
$lower | ForEach-Object {
  $_.A = $_.A.Substring(0, 20).ToLower() 
} 

However, if there's a chance that some property values have fewer than 20 characters, more work is needed, namely to avoid the exception that the .Substring() method would otherwise throw.

$lower = Import-Csv 'C:\Users\X\Desktop\U\cvv.csv'
$lower | ForEach-Object {
  $val = if ($_.A.Length -gt 20) { $_.A.Substring(0, 20) } else { $_.A }
  $_.A = $val.ToLower() 
}
  • The following is a shorter alternative, but will perform poorly if many of the input strings are shorter than 20 characters, because exception handling is expensive in terms of performance:

    • try { $_.A.Substring(0, 20) } catch { $_.A }
  • In PowerShell (Core) 7 , you can shorten the if statement to:

    • $_.A.Length -gt 20 ? $_.A.Substring(0, 20) : $_.A

Optional reading: comparing the performance of various substring-extraction approaches.

  • There are several approaches to extracting substrings in PowerShell, and they vary widely with respect to verbosity and performance:

    • The two aspects aren't related, however, and, in fact, the most verbose approach is fastest in this case.

    • Broadly speaking, the approaches can be classified as:

      • Use of the .NET .Substring() method
      • Use of array slicing, i.e. treating a string as a character array to extract a sub-array from and
      • Use of regex operations via the -replace operator
  • Below are the results of benchmarks, which give a rough sense of relative performance:

    • Performance measurements in PowerShell aren't an exact science, and the results depend on many factors - not least the host hardware; the benchmarks below average 50 runs to get a better sense, and it is the relative performance, reflected in the Factor column that is of interest (1.00 reflecting the fastest time, all other values being multiples of that).

    • Substring extraction of (up to) 20 chars. is performed on 1,000 strings, half of which are longer than that, half of which are shorter.

  • Important: The benchmarks juxtapose conditional solutions for .Substring() calls with unconditional -replace and array-slicing solutions, which skews the results - to compare true substring-extraction performance, the latter two approaches need to be modified to use conditionals too.

    • The reason for using conditional processing only for the .Substring() approach is that it is a necessity there - in order to avoid costly exceptions - whereas the appeal of the other approaches is concision, i.e. not having to use conditionals.

Benchmark results:

  • Results from running in Windows PowerShell v5.1 on a Windows 10 machine:
Factor Secs (50-run avg.) Command                                                                                                          TimeSpan
------ ------------------ -------                                                                                                          --------
1.00   0.001              # .Substring   if...                                                                                             00:00:00.0005147
3.26   0.002              # -replace   capture group...                                                                                    00:00:00.0016768
9.14   0.005              # -replace   lookbehind...                                                                                       00:00:00.0047054
179.29 0.092              # .Substring   try...                                                                                            00:00:00.0922835
230.68 0.119              # array slicing   [string]::new()...                                                                             00:00:00.1187340
268.38 0.138              # array slicing   -join ...                                                                                      00:00:00.1381349
  • Results from running in PowerShell (Core) 7.3.0-preview.3 on the same Windows 10 machine:
Factor Secs (50-run avg.) Command                                                                                                             TimeSpan
------ ------------------ -------                                                                                                             --------
1.00   0.000              # .Substring   if…                                                                                                  00:00:00.0004693
1.40   0.001              # .Substring   ternary conditional…                                                                                 00:00:00.0006553
3.90   0.002              # -replace   capture group…                                                                                         00:00:00.0018304
7.10   0.003              # -replace   lookbehind…                                                                                            00:00:00.0033339
184.87 0.087              # .Substring   try…                                                                                                 00:00:00.0867657
253.87 0.119              # array slicing   [string]::new()…                                                                                  00:00:00.1191510
267.48 0.126              # array slicing   -join …                                                                                           00:00:00.1255387
  • Summary:
    • The .Substring()-based approaches are by far the fastest - except if combined with try / catch (exception handling is expensive).
      • Surprisingly, the ternary conditional (? :) turned out to be a bit slower than the equivalent if statement.
    • The -replace-based ones are slower by a factor of 3-4 with the capture-group variant, with the variant that uses a look-behind assertion being about twice as slow as that.
    • By far the slowest are the array-slicing approaches, by two orders of magnitude.

Benchmark source code:

  • To run these benchmarks yourself, you must download function Time-Command from this Gist.

    • Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:

      irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
      
# Create 1000 strings, half of which longer than 20 chars., and half shorter.
$strs = , ('x' * 30) * 500   , ('y' * 10) * 500

# Construct an array of script blocks with the various
# substring-extraction methods.
$cmds = 
{ # -replace   capture group
  foreach ($s in $strs) {
    $s -replace '^(.{20}). ', '$1'
  }
}, 
{ # -replace   lookbehind
  foreach ($s in $strs) {
    $s -replace '(?<=^.{20}). '
  }
},
{ # .Substring   try
  foreach ($s in $strs) {
    try { $s.Substring(0, 20) } catch { $_}
  }
},
{ # .Substring   if
  foreach ($s in $strs) {
    if ($s.Length -gt 20) { $s.Substring(0, 20) } else { $s }
  }
},
{ # array slicing   -join 
  foreach ($s in $strs) {
    -join $s[0..19]
  }
},
{ # array slicing   [string]::new()
  foreach ($s in $strs) {
    [string]::new($s[0..19])
  }
}

# PowerShell (Core): add variant with ternary conditional.
if ($IsCoreClr) {
  # Note: The script block must be constructed *as a string*,
  #       to avoid breaking the parsing stage of the script in Windows PowerShell.
  $cmds  = [scriptblock]::Create(@'
  # .Substring   ternary conditional
  foreach ($s in $strs) {
    $s.Length -gt 20 ? $s.Substring(0, 20) : $s
  }  
'@)
}

# Compare the performance of various substring extraction methods,
# averaged over 50 runs.
Time-Command -Count 50 $cmds

CodePudding user response:

I would personally use the index operator [ ] in combination with the range operator ..:

Import-Csv "C:\\Users\\X\\Desktop\\U\\cvv.csv" | ForEach-Object {
    $_.A = [string]::new($_.A[0..19]).ToLower() # Update the the `A` value
    $_ # Output the object
} | Export-Csv -Path "C:\\Users\\X\\Desktop\\U\\cvv2.csv"

It would handle strings that are below or above the desired Length:

PS /> 'HELLO WORLD', 'ONLY 20 CHARS LENGTH ALLOWED' | ForEach-Object {
    [string]::new($_[0..19]).ToLower()
}


hello world
only 20 chars length
  • Related