Home > database >  Remove evey substring after a given character in a list
Remove evey substring after a given character in a list

Time:03-02

I have a list that has the following structure:

[1] "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"

I want to extract the third element (separating by the | symbol, and removing for the given string everything that is after the ( symbol.

So I would get this character:

[1] "Concentration"

What I do is first split by the | symbol. Then, get the third element of the generated list. In order to be able to use gsub I convert to character, and then I apply gsub function, like follows.

y <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
y <- strsplit(y,  "\\|")
y <- y[[1]][3]
y <- as.character(y)
gsub("(.*","",y)

However, this error is prompted:

invalid regular expression '(.*', reason 'Missing ')''

CodePudding user response:

You may use strsplit with unlist here:

x <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
output <- unlist(strsplit(x, "\\|"))[3]
output

[1] "Concentration(ng/mL)"

If some inputs might have have at least two | separators, then you may first check the size of the vector output from the above before trying to access the third element.

CodePudding user response:

First of all, you don't need y <- as.character(y), since the result would already be of class "character".

Second, your problem lies in the pattern inside gsub(), where you need to escape the opening bracket. Therefore your full code should be:

y <- "Atp|Barcelona|Concentration(ng/mL)|8|FALSE"
y <- strsplit(y,  "\\|")
y <- y[[1]][3]
gsub("\\(.*","",y)

[1] "Concentration"
  • Related