remove newline followed by multiple tabs regex-CodePudding

I tried to remove newline followed by tab with space using following regex in sed

sed "s|\n[\t|\s]*| |" input.log  > output.log

It does not work but if I use per then it replaces all new line as well. I want to replace only newline followed by tab or space multiple times (more than 1 time) with a space.

perl -pe '/\n[\t|\s]*/ /' input.log  > output.log

Sample data in below link:

CodePudding user response：

1st solution: In sed you can try following code. Using GNU sed's -E option to enable ERE(extended regular expression) along with its -z option to read whole Input_file at once. Then in main program using s option of sed to perform substitution; where substituting newline followed by tabs/spaces and substituting it with only new lines as per required output.

sed -E -z 's~\n[[:blank:]] ~\n~g'  Input_file

2nd solution: In GNU awk with setting RS to null will also help here to do the substitution, like as follows:

awk -v RS= '{gsub(/\n[[:blank:]] /,"\n")} {ORS=RT;print}' Input_file

CodePudding user response：

If you want to match more than 1 tabs/spaces, you can use a quantifier for 2 or more spaces and replace that match with a space.

sed 's/^[[:space:]]\{2,\}/ /' file

If you want to skip the first line, you can specify a range from the second line till the end:

sed '2,$s/^[[:space:]]\{2,\}/ /' file

Or if it is the first line, don't do the replacement:

sed '1!s/^[[:space:]]\{2,\}/ /' file