Home > OS >  new column based on presence of string
new column based on presence of string

Time:12-20

I asked a question like this yesterday, but today I need help doing it in R. You can see the original question here: Create new indicator columns based on values in another column

I have some data that looks like this:

df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'))


I want it to look like this:

goal_df <- data.frame(col = c('I want an apple', 'i hate pears', 'please buy a peach and an apple', 'I want squash'), 
                      apple = c(1, 0, 1, 0), 
                      pear = c(0, 1, 0, 0), 
                      peach = c(0, 0, 1, 0))

head(goal_df)
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

I tried this:

fruits <- list('apple', 'pear', 'peach')

for (i in fruits){
  df$i <- ifelse(str_detect(df$col, i), 1, 0)
}
                              col x
1                 I want an apple 0
2                    i hate pears 0
3 please buy a peach and an apple 1
4                   I want squash 0

Can someone help me with what I'm doing wrong here? I'm not sure why this is only creating one column.

CodePudding user response:

The $ should be changed to [[ - $

for (i in fruits){
   df[[i]] <- ifelse(str_detect(df$col, i), 1, 0)
 }

-output

> df
                              col apple pear peach
1                 I want an apple     1    0     0
2                    i hate pears     0    1     0
3 please buy a peach and an apple     1    0     1
4                   I want squash     0    0     0

The output OP got would be with i as column name instead of x (maybe there is some typo) as $ will create the i column instead of the value in i and its gets updated on each iteration returning the value for the last element in 'fruits' i.e. for 'peach'

> df
                              col i
1                 I want an apple 0
2                    i hate pears 0
3 please buy a peach and an apple 1
4                   I want squash 0

CodePudding user response:

Define pattern and put together with col in outer with grepl.

pa <- c('apple', 'pear', 'peach')

data.frame(df, `colnames<-`( t(outer(pa, df$col, Vectorize(grepl))), pa))
#                               col apple pear peach
# 1                 I want an apple     1    0     0
# 2                    i hate pears     0    1     0
# 3 please buy a peach and an apple     1    0     1
# 4                   I want squash     0    0     0

df <- structure(list(col = c("I want an apple", "i hate pears", "please buy a peach and an apple", 
"I want squash")), class = "data.frame", row.names = c(NA, -4L
))

CodePudding user response:

You can use rowwise and map to make list-columns:

library(tidyverse)

names(fruits) <- fruits # makes new column names automatic

df %>% 
  rowwise() %>% 
  mutate(fruit_test = list(map_int(fruits, ~str_detect(col, .)))) %>% 
  unnest_wider(fruit_test)

# A tibble: 4 × 4
  col                             apple  pear peach
  <fct>                           <int> <int> <int>
1 I want an apple                     1     0     0
2 i hate pears                        0     1     0
3 please buy a peach and an apple     1     0     1
4 I want squash                       0     0     0

CodePudding user response:

We can try the following base R option

u <- with(
  df,
  regmatches(
    col,
    gregexpr(
      do.call(paste, c(fruits, sep = "|")),
      col
    )
  )
)

cbind(df,unclass(t(table(stack(setNames(u, seq_along(u)))))))

which gives

                              col apple peach pear
1                 I want an apple     1     0    0
2                    i hate pears     0     0    1
3 please buy a peach and an apple     1     1    0
4                   I want squash     0     0    0
  • Related