I have the following example lines in a file:
sweet_25 2 0 4
guy_guy 2 4 6
ging_ging 0 0 3
moat_2 0 1 0
I want to process the file and have the following output:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
Notice that the required effect happened in lines 2 and 3 - that an underscore and text follwing a text is remove on lines where this pattern occurs.
I have not succeeded with the follwing:
sed -E 's/([a-zA-Z])_[a-zA-Z]/$1/g' file.txt >out.txt
Any bash or awk advice will be welcome.Thanks
CodePudding user response:
If you want to replace the whole word after the underscore, you have to repeat the character class one or more times using [a-zA-Z]
and use \1
in the replacement.
sed -E 's/([a-zA-Z])_[a-zA-Z] /\1/g' file.txt >out.txt
If the words should be the same before and after the underscore, you can use a repeating capture group with a backreference.
If you only want to do this for the start of the string you can prepend ^
to the pattern and omit the /g
at the end of the sed command.
sed -E 's/([a-zA-Z] )(_\1) /\1/g' file.txt >out.txt
The pattern matches:
([a-zA-Z] )
Capture group 1, match 1 or more occurrences of a char a-zA-Z(_\1)
Capture group 2, repeat matching_
and the same text captured by group 1
The file out.txt will contain:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
CodePudding user response:
You can do it more simply, like this:
sed -E 's/_[a-zA-Z] //' file.txt >out.txt
This just replaces an underscore followed by any number of alphabetical characters with nothing.
CodePudding user response:
With your shown samples, please try following awk
code.
awk 'split($1,arr,"_") && arr[1] == arr[2]{$1=arr[1]} 1' Input_file
Explanation: Simple explanation would be, using awk
's split
function that splits 1st field into an array named arr
with delimiter _
AND then checking condition if 1st element of arr is EQAUL to 2nd element of arr then save only 1st element of arr to first field($1
) and by mentioning 1
printing edited/non-edited lines.
CodePudding user response:
$ awk 'NR~/^[23]$/{sub(/_[^ ] /,"")} 1' file
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0
CodePudding user response:
I would do:
awk '$1~/[[:alpha:]]_[[:alpha:]]/{sub(/_.*/,"",$1)} 1' file
Prints:
sweet_25 2 0 4
guy 2 4 6
ging 0 0 3
moat_2 0 1 0