How can I delete a string inside parentheses using sed, inside a sed script?-CodePudding

I would like to remove text inside parentheses (including the parentheses) using sed in a sed script. For example, I would like to delete the phrase (Chris Pratt) and keep (Chris_Pratt). (They are both on the same line). And do this for the entire file. For example, the line looks like this:

Star Lord (Chris Pratt), age 42, actor, (Chris_Pratt)

This is what I would want to to look like after the sed command in a sed script:

Star Lord, age 42, actor, (Chris_Pratt)

That's what I want to do with every single line (there are multiple lines with other names).

I have already tried:

s/[(][^)]*[)]//g

This one works, but it also deletes the parentheses including the underscore, also:

s/\([[:alpha:]]{1,} [[:alpha:] ]{1,}\)\ //g

This one does work when I run it with sed normally in the command line, but it doesn't work when I run it in a script for some reason.

CodePudding user response：

You can use

sed 's/ *([^()_]*)//g' file > outputfile

Details:

*
( - a literal ( char (since it is a POSIX BRE pattern)
[^()_]* - zero or more chars other than (, ) and _
) - a literal ) char (since it is a POSIX BRE pattern)

See the online demo:

#!/bin/bash
s='Star Lord (Chris Pratt), age 42, actor, (Chris_Pratt)'
sed 's/ *([^()_]*)//g' <<< "$s"
# => Star Lord, age 42, actor, (Chris_Pratt)

CodePudding user response：

With your shown samples, please try following sed program. Using sed's backreference capability here.

sed -E 's/(^[^(]*) \([^)]*\)(.*)/\1\2/' Input_file

Explanation: Using sed's -E option which enables ERE(Extended regular expressions) in our program here. Then in main program using s option of sed to perform substitution operation. We are mentioning (^[^(]*) \([^)]*\)(.*) which is creating 2 back references here(a temp space in memory to retrieve caught values later in program). While substituting using \1(first backreference) and \2(2nd back reference) to get the expected output mentioned by OP.

Explanation of regex:

(^[^(]*)    ##Creating 1st capturing group which captures values from starting of line to till 1st occurrence of ( here.
 \([^)]*\)  ##Matching space ( till next occurrence of ) here.
(.*)        ##Creating 2nd capturing group which has everything after previous match.