Home > OS >  R: Identifying whether a string has a bracket and whether the number in the bracket is greater than
R: Identifying whether a string has a bracket and whether the number in the bracket is greater than

Time:10-31

I am using R and have a column in a dataframe where I would like to check for each row whether there is a bracket and if so whether the number in the bracket is greater than 0. This is so I can subset these rows and apply the appropriate information in another new column.

I am new to slack so please let me know if I need to clarify any details. Thanks in advance.

Edit (apologies it doesn't seem to let me submit in table and insists on it being formatted as code when I want it as a table): So if for example I had a column like:

|Column 1|
|--------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|Q9UK59 1xPhospho [S/T]|
|Q8WW12 1xPhospho [S119(100)]

I want to subset the rows that if they have a bracket and that number is greater than 0 then I will paste the information into a new column. So the logic on the above condition would be TRUE, TRUE, FALSE, TRUE for the example column. Then the pasted information in the new column would be:

|New Column|
|----------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|NA|
|Q8WW12 1xPhospho [S119(100)]

However, downstream of this I would like to fill in the NAs but think I can go from there once I work out this first step.

CodePudding user response:

This can be done by first str_extracting the number in parenthesis, if available, and then running a check with ifelse if that number is greater than 0:

library(stringr)
library(dplyr)
df %>%
  mutate(
    Num = str_extract(Col, "(?<=\\()\\d (\\.\\d )?(?=\\))"),
    New_col = ifelse(as.numeric(Num) > 0, Col, NA)) %>%
  select(-Num)
                              Col                         New_col
1  |Q9H7C4 1xPhospho [S325(100)]|  |Q9H7C4 1xPhospho [S325(100)]|
2 |P11169 1xPhospho [S485(88.2)]| |P11169 1xPhospho [S485(88.2)]|
3        |Q9UK59 1xPhospho [S/T]|                            <NA>
4   |Q8WW12 1xPhospho [S119(100)]   |Q8WW12 1xPhospho [S119(100)]

Data:

df <- data.frame(
  Col = c("|Q9H7C4 1xPhospho [S325(100)]|",
            "|P11169 1xPhospho [S485(88.2)]|",
            "|Q9UK59 1xPhospho [S/T]|",
            "|Q8WW12 1xPhospho [S119(100)]")
)
  • Related