Home > database >  How to identify new line chars when file has both CR and CRLF characters
How to identify new line chars when file has both CR and CRLF characters

Time:10-13

new line char and end of line both present

I need to identify the new line chars if any using powershell or batch file and if present remove.

CodePudding user response:

I am afraid I don't really understand what you want. You didn't posted any input file nor specified what is the output you want from such an input. Anyway, I hope this code can help:

@echo off
setlocal EnableDelayedExpansion

rem Create a test file
set LF=^
%don't remove%
%these lines%

(
echo Line One: CR LF
set /P "=Line Two: LF!LF!"
echo Line Three: CR LF
) > test.txt < NUL

rem Read the file
set "acum=0"
(for /F "tokens=1* delims=:" %%a in ('findstr /O "^" test.txt') do (
   if not defined line (
      set "line=%%b"
   ) else (
      set /A "len=%%a-acum-2, acum=%%a"
      for %%n in (!len!) do if "!line:~%%n!" equ "" (
         echo !line!
      ) else (
         set /P "=!line!"
      )
      set "line=%%b"
   )
)) < NUL
for %%a in (test.txt) do set /A "len=%%~Za-acum-2"
(for %%n in (!len!) do if "!line:~%%n!" equ "" (
   echo !line!
) else (
   set /P "=!line!"
)) < NUL

Output:

Line One: CR LF
Line Two: LFLine Three: CR LF

This example first create a file with three lines, but the second one is ended in LF instead of CR LF. Then, the program identify how each line ends and remove the alone LF's

The method is based on findstr /O switch that reports the offset of the first byte of each line starting from beginning of file

CodePudding user response:

In a comment you state:

each record starts with DTL

It sounds like the way to fix your file is to remove any newlines that aren't followed by verbatim DTL|:

# Create sample file.
@'
DTL|foo1
DTL|foo2
result of an unwanted
newline or two
DTL|foo3
'@ > test.txt

# Replace all newlines not directly followed by verbatim 'DTL|' 
# with a space (remove `, ' '` if you simply want to remove the newlines).
# Pipe to Set-Content in order to save to a file as needed.
(Get-Content -Raw test.txt) -replace '\r?\n(?!DTL\|)', ' '

Output:

DTL|foo1
DTL|foo2 result of an unwanted newline or two
DTL|foo3 

For an explanation of the regex used with the -replace operator above and the ability to experiment with it, see this regex101.com page.

  • Related