Home > Mobile >  remove new line \n in large file (10GB)
remove new line \n in large file (10GB)

Time:05-27

I have large file 1.txt containing:

User: Test1
Password: P@sawFia1_f

User: Test2
Password: C99vijJiDB9fo@K!!1

I'm using sed -i '/\nPassword/ s///g' 1.txt for remove new line with Password: but it's not removing it. Why? The final output needs to be:

User: Test1;P@sawFia1_f

User: Test2;C99vijJiDB9fo@K!!1

CodePudding user response:

Assumptions:

  • every User: line is followed by a Password: line
  • the actual password value does not contain white space
  • each User/password combo is followed by a blank line
  • all other lines in the file are ignored/discarded (otherwise OP should update the sample input to show how other lines of data are to be processed)

One awk approach:

$ awk '/^User:/ {printf "%s",$0} /^Password:/ {printf ";%s\n\n",$2}' 1.txt
User: Test1;P@sawFia1_f

User: Test2;C99vijJiDB9fo@K!!1

One OP confirms the script works as needed, and assuming OP wants to overwrite the original file, and assuming OP is running GNU awk, OP can add the -i inplace flag to have 1.txt overwritten, eg:

awk -i inplace '/^User:/ { printf "%s", $0 } /^Password:/ { printf ";%s\n\n",$2}' 1.txt

CodePudding user response:

Assuming the lines are paired like that, you can use the following:

perl -pe'
   s/^User:.*\K\n/;/;
   s/^Password:\s*//;
' file.in >file.out

(It can be used as-is or placed all on one line.)

CodePudding user response:

Using any awk, given your provided sample input/output all you'd need is:

$ awk -v RS= '{print $1, $2 ";" $4}' file1.txt
User: Test1;P@sawFia1_f
User: Test2;C99vijJiDB9fo@K!!1

or if you really do need a blank line between each output line:

$ awk -v RS= -v ORS='\n\n' '{print $1, $2 ";" $4}' file1.txt
User: Test1;P@sawFia1_f

User: Test2;C99vijJiDB9fo@K!!1

If that's not all you need then please edit your question to include more truly representative sample input/output including cases that the above doesn't work for.

CodePudding user response:

With your shown samples, please try following awk code, written and tested in GNU awk.

awk -v RS='(^|\n)User:[^\n]*\nPassword:[^\n]*' '
RT{
  sub(/^\n/,"",RT)
  sub(/\n/,";",RT)
  print RT
}
' Input_file

Explanation: Using GNU awk, setting RS(record separator) to (^|\n)User:[^\n]*\nPassword:[^\n]*(explained further in post). In main section of awk checking if RT is NOT NULL then substituting starting new line with NULL in it and then substituting new line with ;, finally printing its value as per required output.

NOTE: Above will print the output on terminal, once you are happy with results you can use GNU awk's -i inplace option, change awk to awk -i inplace in above code.

One liner form of above code:

awk -v RS='(^|\n)User:[^\n]*\nPassword:[^\n]*' 'RT{sub(/^\n/,"",RT);sub(/\n/,";",RT);print RT}' Input_file

CodePudding user response:

perl -i -wnlE'/^Password:\s*(.*)/ ? say "$u;$1" : /^User/ ? $u=$_ : say' file

If the empty line in between isn't important it simplifies to

perl -i -wnlE'/^Password:\s*(.*)/ ? say "$u;$1" : ($u=$_)' file

(Need those parens in the second ternary! Don't remove them :)

The above assumes that the file has the shown structure, first User then Password lines, then an empty (or perhaps other) line(s). If you want to keep a backup use -i.bak instead.

  • Related