I have four text files in the following directory that have varying EOL characters:
C:\Sandbox 1.txt, 2.txt, 3.txt, 4.txt
I would like to write a powershell script that will loop through all files in the directory and find the EOL characters that are being used for each file and print them into a new file named EOL.txt
Sample contents of EOL.txt:
1.txt UNIX(LF)
2.txt WINDOWS(CRLF)
3.txt WINDOWS(CRLF)
4.txt UNIX(LF)
I know to loop through files I will need something like the following, but I'm not sure how to read the file EOL:
Get-ChildItem "C:\Sandbox" -Filter *.txt |
Foreach-Object {
}
OR
Get-Content "C:\Sandbox\*" -EOL | Out-File -FilePath "C:\Sandbox\EOL.txt"
##note that EOL is not a valid Get-Content command
CodePudding user response:
Try the following:
Get-ChildItem C:\Sandbox\*.txt -Exclude EOL.txt |
Get-Content -Raw |
ForEach-Object {
$newlines = [regex]::Matches($_, '\r?\n').Value | Select-Object -Unique
$newLineDescr =
switch ($newlines.Count) {
0 { 'N/A' }
2 { 'MIXED' }
default { ('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"] }
}
# Construct and output a custom object for the file at hand.
[pscustomobject] @{
Path = $_.PSChildName
NewlineFormat = $newLineDescr
}
} # | Out-File ... to save to a file - see comments below.
The above outputs something like:
FileName NewlineFormat
-------- -------------
1.txt UNIX(LF)
2.txt WINDOWS(CRLF)
3.txt N/A
4.txt MIXED
N/A
means that no newlines are present, MIXED
means that both CRLF and LF newlines are present.
You can save the output:
directly in the for-display format shown above by appending a
>
redirection or piping (|
) toOut-File
, as in your question.alternatively, using a structured text format better suited to programmatic processing, such CSV; e.g.:
Export-Csv -NoTypeInformation -Encoding utf8 C:\Sandbox\EOL.txt
Note:
Short of reading the raw bytes of a text file one by one or in batches, the only way to analyze the newline format is to read the file in full and search for newline sequences.
Get-Content -Raw
reads a given file in full.[regex]::Matches($_, '\r?\n').Value
extracts all newline sequences - whether CRLF or LF - from the file's content, andSelect-Object -Unique
reduces them to the set of distinct sequences.('UNIX(LF)', 'WINDOWS(CRLF)')[$newlines -eq "`r`n"]
is a convenient, but somewhat obscure emulation of the following ternary conditional:$newlines -eq "`r`n" ? 'WINDOWS(CRLF)' : 'UNIX(LF)'
, which could be used in PowerShell (Core) 7 as-is, but, unfortunately isn't supported in Windows PowerShell.The technique relies on a
[bool]
value getting coerced to an[int]
value when used as an array index ($true
->1
,$false
->0
), thereby selecting the appropriate element from the input array.If you don't mind the verbosity, you can use a regular
if
statement as an expression (i.e., you can assign its output directly to a variable:$foo = if ...
), which works in both PowerShell editions:if ($newlines -eq "`r`n") { 'WINDOWS(CRLF)' } else { 'UNIX(LF)' }
Simpler alternative via WSL, if installed:
WSL comes with the file
utility, which analyzes the content of files and reports summary information, including newline formats.
While you get no control over the output format, which invariably includes additional information, such as the file's character encoding, the command is much simpler:
Set-Location C:\Sandbox
wsl file *.txt
Caveats:
- This approach is fundamentally limited to files on local drives.
- If changing to the target dir. is not an option, relative paths would need their
\
instances translated to/
, and full paths would need drive specs. such asC:
translated to/mnt/c
(lowercase!).
Interpreting the output:
- If the term
line terminators
(referring to newlines) is not mentioned in the output (for text files), Unix (LF) newlines only are implied. - Windows (CRLF) newlines only are implied if you see
with CRLF line terminators
- In case of a mix of LF and CRLF, you'll see
with CRLF, LF line terminators
- In the absence of newlines you'll see
with no line terminators