I know similar questions have been asked around, like for example here, but I've been unable to reproduce desired results for my own needs with such examples, and I don't understand why.
I want to replace all words in a file starting with the character '@' by <@MENTION>. For example, this:
I have 6 @emailaddresses and 10% of the people don't eat sandwiches! I have six @emailaddresses and 10%... @123_Username @BAPP, you shouldn't say that! I recently called@User but he didn't answer. @Username is not a nice person! This @username guy is really cool!
Should become this:
I have 6 <@MENTION> and 10% of the people don't eat sandwiches! I have six <@MENTION> and 10%... <@MENTION> <@MENTION>, you shouldn't say that! I recently called@User but he didn't answer. <@MENTION> is not a nice person! This <@MENTION> guy is really cool!
I have tried this:
sed 's/@[a-zA-Z0-9_]*/<@MENTION>/'
But the string '@BAPP' is not taken into account, which I'd like to, and 'called@User' is taken into account, which I would prefer to avoid.
I also tried this:
sed -E -e 's/\b@[a-zA-Z0-9_]*\b/<@MENTION>/'
But for a reason I don't know the word boundaries are not taken into account...
Any help to help me understand my way around this would be much appreciated, as I'm (obviously) learning and have a limited experience with Bash. Thanks a lot in advance.
CodePudding user response:
One sed
idea:
$ sed -E 's/(^|[^[:alnum:]])@[a-zA-Z0-9_]*(\>)/\1<@MENTION>\2/g' file
NOTE: the initial ^
was added to address the case where the desired string is at the beginning of the line.
This generates:
# assuming embedded linefeeds
I have 6 <@MENTION> and 10% of the people don't eat sandwiches! I have six
<@MENTION> and 10%... <@MENTION> <@MENTION>, you shouldn't say that! I recently
called@User but he didn't answer. <@MENTION> is not a nice person! This <@MENTION> guy
is really cool!
# assuming no embedded linefeeds
I have 6 <@MENTION> and 10% of the people don't eat sandwiches! I have six <@MENTION> and 10%... <@MENTION> <@MENTION>, you shouldn't say that! I recently called@User but he didn't answer. <@MENTION> is not a nice person! This <@MENTION> guy is really cool!
CodePudding user response:
\b
is non-standard. It represents a zero-width assertion that a "word" character is on one side and a non-"word" character is on the other. However, the definition of "word" means that it doesn't help you (@
is not "word" character).
Unless you give the g
flag to s///
, it only changes one match per line.
You probably don't want to match @
not followed by "word" characters, so using *
is incorrect.
Putting that together:
sed -E 's/(^|[^a-zA-Z_<])@[a-zA-Z0-9_] /\1<@MENTION>/g'
^|[^a-zA-Z_<]
matches start of line or characters not listed in[]
. Edit to be what you want to exclude. Adding<
means you don't change existing<@MENTION>
s.
CodePudding user response:
Using sed
$ sed -E 's/(\s )@[^ ]*/\1<@MENTION>/g' input_file
I have 6 <@MENTION> and 10% of the people don't eat sandwiches!
I have six <@MENTION> and 10%... <@MENTION> <@MENTION>
you shouldn't say that! I recently called@User but he didn't answer.
@Username is not a nice person! This <@MENTION> guy is really cool!
CodePudding user response:
With your shown samples only. In GNU sed
with -E
option enabled you can try following. Simple explanation would be, enabling ERE(extended regular expressions) then substituting @
followed by all non-spaces values with <@MENTION> and using g
flag to make that substitution happen globally.
sed -E 's/@\S /<@MENTION>/g' Input_file
OR to be more specific try following sed
with small tweak to above answer:
sed -E 's/(\s )@\S /\1<@MENTION>/g' Input_file
CodePudding user response:
This sed worked for me:
sed 's/@\w*/<@MENTION>/g' file