Home > database >  use bash or awk to replace part of a string
use bash or awk to replace part of a string

Time:03-20

I have the following example lines in a file:

sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0

I want to process the file and have the following output:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.

I have not succeeded with the follwing:

sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt

Any bash or awk advice will be welcome.Thanks

CodePudding user response:

If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z] and use \1 in the replacement.

sed -E 's/([a-zA-Z])_[a-zA-Z] /\1/g' file.txt >out.txt

If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.

If you only want to do this for the start of the string you can prepend ^ to the pattern and omit the /g at the end of the sed command.

sed -E 's/([a-zA-Z] )(_\1) /\1/g' file.txt >out.txt

The pattern matches:

  • ([a-zA-Z] ) Capture group 1, match 1 or more occurrences of a char a-zA-Z
  • (_\1) Capture group 2, repeat matching _ and the same text captured by group 1

The file out.txt will contain:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

CodePudding user response:

You can do it more simply, like this:

sed -E 's/_[a-zA-Z] //' file.txt >out.txt

This just replaces an underscore followed by any number of alphabetical characters with nothing.

CodePudding user response:

With your shown samples, please try following awk code.

awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file

Explanation: Simple explanation would be, using awk's split function that splits 1st field into an array named arr with delimiter _ AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1) and by mentioning 1 printing edited/non-edited lines.

CodePudding user response:

$ awk 'NR~/^[23]$/{sub(/_[^ ] /,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

CodePudding user response:

I would do:

awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file

Prints:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
  • Related