In R
I have two questions. Is there a way to create a new column that shows which overall treatment is used based on certain wording of rows in a different column? See example data and output 1.
example data 1
a
SPAWN1
SPAWN1
SPAWN1
NOSPAWN1
NOSPAWN1
NOSPAWN1
SPAWN2
SPAWN2
SPAWN2
NOSPAWN2
NOSPAWN2
NOSPAWN2
example output 1
a b
SPAWN1 SPAWN
SPAWN1 SPAWN
SPAWN1 SPAWN
NOSPAWN1 NOSPAWN
NOSPAWN1 NOSPAWN
NOSPAWN1 NOSPAWN
SPAWN2 SPAWN
SPAWN2 SPAWN
SPAWN2 SPAWN
NOSPAWN2 NOSPAWN
NOSPAWN2 NOSPAWN
NOSPAWN2 NOSPAWN
My second question is about only keeping columns that have similar wording in them. See example data and output 2. This example is only keeping columns with "max" in the name example data 2
min_Ca max_Ca min_Zn max_Zn
example output 2
max_Ca max_Zn
CodePudding user response:
For the first operation, we can use sub()
here:
df$b <- sub("\\d $", "", df$a)
For the second part, use grep
:
df <- df[, grepl("max", names(df))]
CodePudding user response:
Here is the tidyverse
pendant:
library(dplyr)
library(stringr)
# question 1
df %>%
mutate(new_col = str_extract(a, '[A-Za-z]*'))
df1 %>%
filter(str_detect(b, 'max'))