I asked a question like this yesterday, but today I need help doing it in R. You can see the original question here: Create new indicator columns based on values in another column
I have some data that looks like this:
df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'))
I want it to look like this:
goal_df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'),
apple = c(1, 0, 1, 0),
pear = c(0, 1, 0, 0),
peach = c(0, 0, 1, 0))
head(goal_df)
col apple pear peach
1 I want an apple 1 0 0
2 i hate pears 0 1 0
3 please buy a peach and an apple 1 0 1
4 I want squash 0 0 0
I tried this:
fruits <- list('apple', 'pear', 'peach')
for (i in fruits){
df$i <- ifelse(str_detect(df$col, i), 1, 0)
}
col x
1 I want an apple 0
2 i hate pears 0
3 please buy a peach and an apple 1
4 I want squash 0
Can someone help me with what I'm doing wrong here? I'm not sure why this is only creating one column.
CodePudding user response:
The $
should be changed to [[
- $
for (i in fruits){
df[[i]] <- ifelse(str_detect(df$col, i), 1, 0)
}
-output
> df
col apple pear peach
1 I want an apple 1 0 0
2 i hate pears 0 1 0
3 please buy a peach and an apple 1 0 1
4 I want squash 0 0 0
The output OP got would be with i
as column name instead of x
(maybe there is some typo) as $
will create the i
column instead of the value in i
and its gets updated on each iteration returning the value for the last element in 'fruits' i.e. for 'peach'
> df
col i
1 I want an apple 0
2 i hate pears 0
3 please buy a peach and an apple 1
4 I want squash 0
CodePudding user response:
Define pattern and put together with col
in outer
with grepl
.
pa <- c('apple', 'pear', 'peach')
data.frame(df, `colnames<-`( t(outer(pa, df$col, Vectorize(grepl))), pa))
# col apple pear peach
# 1 I want an apple 1 0 0
# 2 i hate pears 0 1 0
# 3 please buy a peach and an apple 1 0 1
# 4 I want squash 0 0 0
df <- structure(list(col = c("I want an apple", "i hate pears", "please buy a peach and an apple",
"I want squash")), class = "data.frame", row.names = c(NA, -4L
))
CodePudding user response:
You can use rowwise
and map
to make list-columns:
library(tidyverse)
names(fruits) <- fruits # makes new column names automatic
df %>%
rowwise() %>%
mutate(fruit_test = list(map_int(fruits, ~str_detect(col, .)))) %>%
unnest_wider(fruit_test)
# A tibble: 4 × 4
col apple pear peach
<fct> <int> <int> <int>
1 I want an apple 1 0 0
2 i hate pears 0 1 0
3 please buy a peach and an apple 1 0 1
4 I want squash 0 0 0
CodePudding user response:
We can try the following base R option
u <- with(
df,
regmatches(
col,
gregexpr(
do.call(paste, c(fruits, sep = "|")),
col
)
)
)
cbind(df,unclass(t(table(stack(setNames(u, seq_along(u)))))))
which gives
col apple peach pear
1 I want an apple 1 0 0
2 i hate pears 0 0 1
3 please buy a peach and an apple 1 1 0
4 I want squash 0 0 0