I am working on a dataframe containing text across multiple columns. I would like to identify cells containing a search string and copy the whole cell onto a new column. I know for sure that the search string would occur only once per row across the columns. The following table could serve as an example:
a b c
1 "I eat" "I am singing" "Run"
2 "I am running" "I have eaten" "Sleep"
3 "I sleep" "I see" "I am eating"
4 "Eat" "I Run" "I am Seeing"
5 "I sing" "was eaten" "I am Sleeping"
I would like to search for cells containing substring "eat" and create a new column with the cells in the followwing manner:
a b c New column
1 "I eat" "I am singing" "Run" "I eat"
2 "I am running" "I have eaten" "Sleep" "I have eaten"
3 "I sleep" "I see" "I am eating" "I am eating"
4 "Eat" "I Run" "I am Seeing" "Eat"
5 "I sing" "was eaten" "I am Sleeping" "was eaten"
What would be the best approach ? Thank you in advance ...
CodePudding user response:
base R
dat$New <- apply(dat, 1, function(z) grep("eat", z, value = TRUE, ignore.case = TRUE)[1])
dat
# a b c New
# 1 I eat I am singing Run I eat
# 2 I am running I have eaten Sleep I have eaten
# 3 I sleep I see I am eating I am eating
# 4 Eat I Run I am Seeing Eat
# 5 I sing was eaten I am Sleeping was eaten
dplyr
library(dplyr)
dat %>%
rowwise() %>%
mutate(New = grep("eat", c_across(a:c), value = TRUE, ignore.case = TRUE)[1]) %>%
ungroup()
# # A tibble: 5 x 4
# a b c New
# <chr> <chr> <chr> <chr>
# 1 I eat I am singing Run I eat
# 2 I am running I have eaten Sleep I have eaten
# 3 I sleep I see I am eating I am eating
# 4 Eat I Run I am Seeing Eat
# 5 I sing was eaten I am Sleeping was eaten
Data
dat <- structure(list(a = c("I eat", "I am running", "I sleep", "Eat", "I sing"), b = c("I am singing", "I have eaten", "I see", "I Run", "was eaten"), c = c("Run", "Sleep", "I am eating", "I am Seeing", "I am Sleeping"), New = c("I eat", "I have eaten", "I am eating", "Eat", "was eaten")), row.names = c(NA, -5L), class = "data.frame")
CodePudding user response:
Or a dplyr
/stringr
option:
library(dplyr)
library(stringr)
df |>
rowwise() |>
mutate(eat = str_subset(c_across(a:c), regex("eat", ignore_case = TRUE))) |>
ungroup()
Output:
# A tibble: 5 × 4
a b c eat
<chr> <chr> <chr> <chr>
1 I eat I am singing Run I eat
2 I am running I have eaten Sleep I have eaten
3 I sleep I see I am eating I am eating
4 Eat I Run I am Seeing Eat
5 I sing was eaten I am Sleeping was eaten