I am using R and have a column in a dataframe where I would like to check for each row whether there is a bracket and if so whether the number in the bracket is greater than 0. This is so I can subset these rows and apply the appropriate information in another new column.
I am new to slack so please let me know if I need to clarify any details. Thanks in advance.
Edit (apologies it doesn't seem to let me submit in table and insists on it being formatted as code when I want it as a table): So if for example I had a column like:
|Column 1|
|--------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|Q9UK59 1xPhospho [S/T]|
|Q8WW12 1xPhospho [S119(100)]
I want to subset the rows that if they have a bracket and that number is greater than 0 then I will paste the information into a new column. So the logic on the above condition would be TRUE, TRUE, FALSE, TRUE for the example column. Then the pasted information in the new column would be:
|New Column|
|----------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|NA|
|Q8WW12 1xPhospho [S119(100)]
However, downstream of this I would like to fill in the NAs but think I can go from there once I work out this first step.
CodePudding user response:
This can be done by first str_extract
ing the number in parenthesis, if available, and then running a check with ifelse
if that number is greater than 0:
library(stringr)
library(dplyr)
df %>%
mutate(
Num = str_extract(Col, "(?<=\\()\\d (\\.\\d )?(?=\\))"),
New_col = ifelse(as.numeric(Num) > 0, Col, NA)) %>%
select(-Num)
Col New_col
1 |Q9H7C4 1xPhospho [S325(100)]| |Q9H7C4 1xPhospho [S325(100)]|
2 |P11169 1xPhospho [S485(88.2)]| |P11169 1xPhospho [S485(88.2)]|
3 |Q9UK59 1xPhospho [S/T]| <NA>
4 |Q8WW12 1xPhospho [S119(100)] |Q8WW12 1xPhospho [S119(100)]
Data:
df <- data.frame(
Col = c("|Q9H7C4 1xPhospho [S325(100)]|",
"|P11169 1xPhospho [S485(88.2)]|",
"|Q9UK59 1xPhospho [S/T]|",
"|Q8WW12 1xPhospho [S119(100)]")
)