How to keep columns based on certain wording-CodePudding

In R

I have two questions. Is there a way to create a new column that shows which overall treatment is used based on certain wording of rows in a different column? See example data and output 1.

example data 1

a
SPAWN1
SPAWN1
SPAWN1
NOSPAWN1
NOSPAWN1
NOSPAWN1
SPAWN2
SPAWN2
SPAWN2
NOSPAWN2
NOSPAWN2
NOSPAWN2

example output 1

a             b
SPAWN1      SPAWN
SPAWN1      SPAWN
SPAWN1      SPAWN
NOSPAWN1    NOSPAWN
NOSPAWN1    NOSPAWN
NOSPAWN1    NOSPAWN
SPAWN2      SPAWN
SPAWN2      SPAWN
SPAWN2      SPAWN
NOSPAWN2    NOSPAWN
NOSPAWN2    NOSPAWN
NOSPAWN2    NOSPAWN

My second question is about only keeping columns that have similar wording in them. See example data and output 2. This example is only keeping columns with "max" in the name example data 2

min_Ca max_Ca min_Zn max_Zn

example output 2

max_Ca max_Zn

CodePudding user response：

For the first operation, we can use sub() here:

df$b <- sub("\\d $", "", df$a)

For the second part, use grep:

df <- df[, grepl("max", names(df))]

CodePudding user response：

Here is the tidyverse pendant:

library(dplyr)
library(stringr)

# question 1
df %>% 
  mutate(new_col = str_extract(a, '[A-Za-z]*'))

df1 %>% 
  filter(str_detect(b, 'max'))