Home > database >  Read and write to same txt file in loop with StreamReader
Read and write to same txt file in loop with StreamReader

Time:03-20

I have a working script in PowerShell:

$file = Get-Content -Path HKEY_USERS.txt -Raw

foreach($line in [System.IO.File]::ReadLines("EXCLUDE_HKEY_USERS.txt"))
{
    $escapedLine = [Regex]::Escape($line)
    $pattern = $("(?sm)^$escapedLine.*?(?=^\[HKEY)")
    
    $file -replace $pattern, ' ' | Set-Content HKEY_USERS-filtered.txt
    $file = Get-Content -Path HKEY_USERS-filtered.txt -Raw
}

For each line in EXCLUDE_HKEY_USERS.txt it is performing some changes in file HKEY_USERS.txt. So with every loop iteration it is writing to this file and re-reading the same file to pull the changes. However, Get-Content is notorious for memory leaks, so I wanted to refactor it to StreamReader and StreamWriter, but I'm a having a hard time to make it work.

As soon as I do:

$filePath = 'HKEY_USERS-filtered.txt';
$sr = New-Object IO.StreamReader($filePath);
$sw = New-Object IO.StreamWriter($filePath);

I get:

New-Object : Exception calling ".ctor" with "1" argument(s): "The process cannot access the file 
'HKEY_USERS-filtered.txt' because it is being used by another process."

So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?

CodePudding user response:

tl;dr

  • Get-Content -Raw reads a file as a whole and is fast and consumes little unwanted memory.

  • [System.IO.File]::ReadLines() is a faster and more memory-efficient alternative to line-by-line reading with Get-Content (without -Raw), but you need to ensure that the input file is passed as a full path, because .NET's working directory usually differs from PowerShell's.

    • Convert-Path resolves a given relative path to a full, file-system-native one.

    • A PowerShell-native alternative to using [System.IO.File]::ReadLines() is the switch statement with the -File parameter, which performs similarly well while avoiding the working-directory discrepancy pitfall, and offers additional features.

  • There is no need to save the modified file content to disk after each iteration - just update the $file variable, and, after exiting the loop, save the value of $file to the output file.

$fileContent = Get-Content -Path HKEY_USERS.txt -Raw

# Be sure to specify a *full* path.
$excludeFile = Convert-Path -LiteralPath 'EXCLUDE_HKEY_USERS.txt'

foreach($line in [System.IO.File]::ReadLines($excludeFile)) {
    $escapedLine = [Regex]::Escape($line)
    $pattern = "(?sm)^$escapedLine.*?(?=^\[HKEY)"
    # Modify the content and save the result back to variable $fileContent
    $fileContent = $fileContent -replace $pattern, ' '
}

# After all modifications have been performed, save to the output file
$fileContent | Set-Content HKEY_USERS-filtered.txt

Building on Santiago Squarzon's helpful comments:

  • Get-Content does not cause memory leaks, but it can consume a lot of memory that isn't garbage-collected until an unpredictable later point in time.
    • The reason is that - unless the -Raw switch is used - it decorates each line read with PowerShell ETS (Extended Type System) properties containing metadata about the file of origin, such as its path (.PSPath) and the line number (.ReadCount).
    • This both consumes extra memory and slows the command down - GitHub issue #7537 asks for a way to opt out of this wasteful decoration, as it typically isn't needed.
    • However, reading with -Raw is efficient, because the entire file content is read into a single, multi-line string, which means that the decoration is only performed once.

So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?

No, you cannot. You cannot simultaneously read from a file and overwrite it.

To update / replace an existing file you have two options (note that, for a fully robust solution, all attributes of the original file (except the last write time and size) should be retained, which requires extra work):

  • Read the old content into memory in full, perform the desired modification in memory, then write the modified content back to the original file, as shown in the top section.

    • There is a slight risk of data loss, however, namely if the process of writing back to the file gets interrupted.
  • More safely, write the modified content to a temporary file and, upon successful completion, replace the original file with the temporary one.

  • Related