Assign score based on presence of certain words in a column-CodePudding

I have a dataframe with one column reporting components of a meal, for example:

----------------------------------
| ID |      Component              |
---------------------------------- 
| 1  |      Vegetables             |                                          
| 2  |      Pasta                  |                                               
| 3  |      Pasta, Vegetables      |                                         
| 4  |      Pulses, Vegetables     |                                         
| 5  |      Meat, Pasta, Vegetables|                                      
| 6  |      Meat, Vegetables       |                                        
| 7  |      Pulses                 |                                        
| 8  |      Meat                   |                                           
----------------------------------

I am looking to add an additional column, giving each person a score. I want them to receive a 1 if their meal contained Pasta, and 0 if it didn't. So participants 2, 3 and 5 get a 1, whilst the others get a 0.

Is there code which allows me to apply this to just the term 'pasta' ?

Any help would be appreciated! thanks.

CodePudding user response：

We can use grepl to match the substring 'Pasta' which returns a logical vector, which is convert to binary with as.integer or

df1$meal_score <-  (grepl('Pasta', df1$Component))

CodePudding user response：

A tidyverse solution just for fun:

library(tidyverse)

df1 %>% 
  mutate(score =  str_detect(Component, "Pasta"))

#>   ID               Component score
#> 1  1              Vegetables     0
#> 2  2                   Pasta     1
#> 3  3       Pasta, Vegetables     1
#> 4  4      Pulses, Vegetables     0
#> 5  5 Meat, Pasta, Vegetables     1
#> 6  6        Meat, Vegetables     0
#> 7  7                  Pulses     0
#> 8  8                    Meat     0

Data:

txt <- "ID|Component
1|Vegetables
2|Pasta
3|Pasta, Vegetables
4|Pulses, Vegetables
5|Meat, Pasta, Vegetables
6|Meat, Vegetables
7|Pulses
8|Meat"

df1 <- read.table(text = txt,  sep = "|", stringsAsFactors = F, header = T)

CodePudding user response：

You can use

library(dplyr)

df |> mutate(score = as.numeric(grepl("Pasta" , Component , fixed = T)))

output

  ID               Component score
1  1              Vegetables     0
2  2                   Pasta     1
3  3       Pasta, Vegetables     1
4  4      Pulses, Vegetables     0
5  5 Meat, Pasta, Vegetables     1
6  6        Meat, Vegetables     0
7  7                  Pulses     0
8  8                    Meat     0

data

df <- structure(list(ID = 1:8, Component = c("Vegetables", "Pasta", 
"Pasta, Vegetables", "Pulses, Vegetables", "Meat, Pasta, Vegetables", 
"Meat, Vegetables", "Pulses", "Meat")), class = "data.frame", row.names = c(NA, 
-8L))

CodePudding user response：

You can also use str_detect function together with case_when function.

library(stringr)
library(dplyr)

df <- data.frame(
  ID = seq(1:8),
  Component = c("Vegetables",
                "Pasta",
                "Pasta, Vegetables",
                "Pulses, Vegetables",
                "Meat, Pasta, Vegetables",
                "Meat, Vegetables",
                "Pulses",
                "Meat")) %>% 
  mutate(
    score = case_when(
      str_detect(Component, "Pasta") ~ 1,
      T ~ 0
    )
  )

> df
  ID               Component score
1  1              Vegetables      0
2  2                   Pasta      1
3  3       Pasta, Vegetables      1
4  4      Pulses, Vegetables      0
5  5 Meat, Pasta, Vegetables      1
6  6        Meat, Vegetables      0
7  7                  Pulses      0
8  8                    Meat      0