use bash or awk to replace part of a string-CodePudding

I have the following example lines in a file:

sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0

I want to process the file and have the following output:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.

I have not succeeded with the follwing:

sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt

Any bash or awk advice will be welcome.Thanks

CodePudding user response：

If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z] and use \1 in the replacement.

sed -E 's/([a-zA-Z])_[a-zA-Z] /\1/g' file.txt >out.txt

If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.

If you only want to do this for the start of the string you can prepend ^ to the pattern and omit the /g at the end of the sed command.

sed -E 's/([a-zA-Z] )(_\1) /\1/g' file.txt >out.txt

The pattern matches:

([a-zA-Z] ) Capture group 1, match 1 or more occurrences of a char a-zA-Z
(_\1) Capture group 2, repeat matching _ and the same text captured by group 1

The file out.txt will contain:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

CodePudding user response：

You can do it more simply, like this:

sed -E 's/_[a-zA-Z] //' file.txt >out.txt

This just replaces an underscore followed by any number of alphabetical characters with nothing.

CodePudding user response：

With your shown samples, please try following awk code.

awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file

Explanation: Simple explanation would be, using awk's split function that splits 1st field into an array named arr with delimiter _ AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1) and by mentioning 1 printing edited/non-edited lines.

CodePudding user response：

$ awk 'NR~/^[23]$/{sub(/_[^ ] /,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0

CodePudding user response：

I would do:

awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file

Prints:

sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0