Home > Software engineering >  Strip blanks at the beginning & trailing blanks in AWK
Strip blanks at the beginning & trailing blanks in AWK

Time:02-11

I am asking for your assistance to strip blanks/spaces before & at the end each field. ie Remove the trailing space from the $1, the same apply to the beginning & trailing spaces in $2, and the leading spaces from $3 using AWK on AIX 7.2 platform. Below is some data in the file Employee.txt

001 |  George John Aden Brown   | gbrown
002 |   Barry Street White      | bwhite
003 |    Kelly Jones            | kjones
004 |   Jolene Davidson Smith   | jsmith 

My objective is to achieve the following set of data (without the leading/trailing spaces)

001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith

I have tried the following without satisfaction.

awk -F"|" '{ print $1 "|" gsub(" ", "", $2) "|" $3 }' Employee.txt
awk -F"|" '{ print $1 "|" gsub(/[ \t]/,"",$2) "|" $3 }'  Employee.txt
awk -F"|" '{ print $1 "|" gsub(/[[:blank:]]/, "", $2) "|" $3 }' Employee.txt

001 |8| gbrown
002 |11| bwhite
003 |17| kjones
004 |8| jsmith 

Many thanks, George

CodePudding user response:

With your shown samples, please try following awk code. Written and tested in GNU awk, should work in any awk. Simple explanation would be, setting field separator as [[:space:]] \\|[[:space:]] (spaces followed by pipe followed by spaces) for all the lines of Input_file then setting OFS as | for all the lines. In main program then resetting $1 to itself to actually apply new value of OFS to whole line, once its done, simple printing that line by mentioning 1.

awk -v FS='[[:space:]] \\|[[:space:]] ' -v OFS='|' '{$1=$1} 1'  Input_file

CodePudding user response:

I usually - and a LOT:

$ awk '
BEGIN {
    FS=OFS="|"                 # set both separators to pipe
}
{
    for(i=1;i<=NF;i  )         # loop all fields
        gsub(/^  |  $/,"",$i)  # strip leading and trailing space
}1' file                       # output

Output:

001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith

If you got other junk there, feel free to tune the regex:

gsub(/^"?[ \t]*(N\/A)?|[ \t]*"?$/),"",$i)  # etc

CodePudding user response:

You've got good awk answers. However if you want to consider sed this is pretty simple with:

sed -E 's/ *(\|) *|^  |  $/\1/g' file

001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith

Or else with gnu-awk:

awk '{print gensub(/ *(\|) *|^  |  $/, "\\1", "g")}' file

PS: This sed command requires GNU or BSD versions.

CodePudding user response:

If, as you said in your question, you don't want leading or trailing spaces on the lines removed then using any sed:

$ sed 's/ *| */|/g' file
001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith

otherwise if you actually did want the leading/trailing blanks removed too then with GNU or BSD sed for -E:

$ sed -E 's/(^| *)\|( *|$)/|/g' file
001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith

CodePudding user response:

Also with awk

awk '{$2=$2;gsub(/ \| /,"|")} 1' file
001|George John Aden Brown|gbrown
002|Barry Street White|bwhite
003|Kelly Jones|kjones
004|Jolene Davidson Smith|jsmith
  • The stripping of leading and trailing whitespace also comes into play whenever $0 is recomputed. (see: http://gnu.ist.utl.pt/software/gawk/manual/html_node/Regexp-Field-Splitting.html )
  • $2=$2 The assignment of $2 to $2 rebuilds $0. Now we have a new $0 without leading and trailing whitespace.
  • And we apply to $0 the gsub() function: regexp / \| / for space followed by | character followed by space. This is substituted by | character.
  • Related