Home > Net >  How to select multiple rows containing a search strings in multiple columns and create a new column
How to select multiple rows containing a search strings in multiple columns and create a new column

Time:07-28

I am working on a dataframe containing text across multiple columns. I would like to identify cells containing a search string and copy the whole cell onto a new column. I know for sure that the search string would occur only once per row across the columns. The following table could serve as an example:

        a                b              c
1   "I eat"         "I am singing"    "Run"
2   "I am running"  "I have eaten"    "Sleep"
3   "I sleep"       "I see"           "I am eating"
4   "Eat"           "I Run"           "I am Seeing"
5   "I sing"        "was eaten"       "I am Sleeping"

I would like to search for cells containing substring "eat" and create a new column with the cells in the followwing manner:

        a                b              c               New column
1   "I eat"         "I am singing"    "Run"            "I eat"
2   "I am running"  "I have eaten"    "Sleep"          "I have eaten"
3   "I sleep"       "I see"           "I am eating"    "I am eating"
4   "Eat"           "I Run"           "I am Seeing"    "Eat"
5   "I sing"        "was eaten"       "I am Sleeping"  "was eaten"


What would be the best approach ? Thank you in advance ...

CodePudding user response:

base R

dat$New <- apply(dat, 1, function(z) grep("eat", z, value = TRUE, ignore.case = TRUE)[1])
dat
#              a            b             c          New
# 1        I eat I am singing           Run        I eat
# 2 I am running I have eaten         Sleep I have eaten
# 3      I sleep        I see   I am eating  I am eating
# 4          Eat        I Run   I am Seeing          Eat
# 5       I sing    was eaten I am Sleeping    was eaten

dplyr

library(dplyr)
dat %>%
  rowwise() %>%
  mutate(New = grep("eat", c_across(a:c), value = TRUE, ignore.case = TRUE)[1]) %>%
  ungroup()
# # A tibble: 5 x 4
#   a            b            c             New         
#   <chr>        <chr>        <chr>         <chr>       
# 1 I eat        I am singing Run           I eat       
# 2 I am running I have eaten Sleep         I have eaten
# 3 I sleep      I see        I am eating   I am eating 
# 4 Eat          I Run        I am Seeing   Eat         
# 5 I sing       was eaten    I am Sleeping was eaten   

Data

dat <- structure(list(a = c("I eat", "I am running", "I sleep", "Eat", "I sing"), b = c("I am singing", "I have eaten", "I see", "I Run", "was eaten"), c = c("Run", "Sleep", "I am eating", "I am Seeing", "I am Sleeping"), New = c("I eat", "I have eaten", "I am eating", "Eat", "was eaten")), row.names = c(NA, -5L), class = "data.frame")

CodePudding user response:

Or a dplyr/stringr option:

library(dplyr)
library(stringr)

df |>
  rowwise() |>
  mutate(eat = str_subset(c_across(a:c), regex("eat", ignore_case = TRUE))) |>
  ungroup()

Output:

# A tibble: 5 × 4
  a            b            c             eat         
  <chr>        <chr>        <chr>         <chr>       
1 I eat        I am singing Run           I eat       
2 I am running I have eaten Sleep         I have eaten
3 I sleep      I see        I am eating   I am eating 
4 Eat          I Run        I am Seeing   Eat         
5 I sing       was eaten    I am Sleeping was eaten   
  • Related