Home > Software design >  Trim space ONLY from start & end of document without touching internal space using perl in shell scr
Trim space ONLY from start & end of document without touching internal space using perl in shell scr

Time:09-22

I'm trying to trim space at the start and end of a document without touching intermediate space in a file using perl inside a bash script

The file has the following format

<newline>
<space><newline>
<tab><newline>
<space><tab><newline>
START<newline><newline>
<space>INDENTED<newline><newline>
END<newline>
<space><tab><newline>
<tab><newline>
<space><newline>
<newline>

NOTE: <newline> is \n, <space> is & <tab> is \t

So the original file looks like


  
    
    
START


 INDENTED


END

    
    
 


I need the content of the file to be

START<newline><newline>
<space>INDENTED<newline><newline>
END

i.e final file like this

START


 INDENTED


END

I tried using both of them the following command, but it trims intermediate space aswell. Both of them trim space & newlines from the whole document rather than just from start the start of the document

perl -pi -e 's/^\s*//gs' sample.txt
perl -pi -e 's/\A\s*//gs' sample.txt

Both collapsed all internal space

START<newline>
INDENTED<newline>
END<newline>

I tried this. It collapsed newlines

perl -pi -e 's/\s*$//gs' sample.txt
perl -pi -e 's/\s*\Z//gs' sample.txt

Both collapsed newlines

START<space>INDENTEDEND<newline>

Here are my assumptions

  1. \A matches just the start of the document & \Z matches end of document (as opposed ^ & $)
  2. s in the gs flag ensures the whole document is treated as single line with newlines replaced with character \n

I am new to perl. Appreciate if someone can help me understand where I went wrong

CodePudding user response:

You may use this perl in slurp mode:

perl -0777 -pe 's/^\s |(\R)\s $/$1/g' file

Output:

START

 INDENTED

END

Details:

  • -0777 Enables slurp mode to make perl read full file
  • ^\s Match 1 whitespaces at the start of file
  • (\R)\s $: Match a line break followed by 1 whitespaces at the end
  • We use $1 in replacement to put line break back otherwise you will get file content without ending line break

CodePudding user response:

Not perl, but ed is useful for editing files:

$ printf '%s\n' '1,/START/-1d' '/END/ 1,$d' w | ed -s sample.txt
$ cat sample.txt
START

 INDENTED

END

This deletes everything in the ranges of lines from the first to the line before the one matching START, and from the line after END to the end of the file, and then writes the changed file back to disk.


Or a similar perl approach, which only prints lines in the range you want to keep:

perl -i -ne 'print if /START/../END/' sample.txt

CodePudding user response:

Here is a short sed version:

sed -n '/START/,/END/p'

or with the negated logic:

sed '1,/START/{/START/!d}; /END/,${/END/!d}'
  • Related