I have a dataframe with one column reporting components of a meal, for example:
----------------------------------
| ID | Component |
----------------------------------
| 1 | Vegetables |
| 2 | Pasta |
| 3 | Pasta, Vegetables |
| 4 | Pulses, Vegetables |
| 5 | Meat, Pasta, Vegetables|
| 6 | Meat, Vegetables |
| 7 | Pulses |
| 8 | Meat |
----------------------------------
I am looking to add an additional column, giving each person a score. I want them to receive a 1 if their meal contained Pasta, and 0 if it didn't. So participants 2, 3 and 5 get a 1, whilst the others get a 0.
Is there code which allows me to apply this to just the term 'pasta' ?
Any help would be appreciated! thanks.
CodePudding user response:
We can use grepl
to match the substring 'Pasta' which returns a logical vector, which is convert to binary with as.integer
or
df1$meal_score <- (grepl('Pasta', df1$Component))
CodePudding user response:
A tidyverse solution just for fun:
library(tidyverse)
df1 %>%
mutate(score = str_detect(Component, "Pasta"))
#> ID Component score
#> 1 1 Vegetables 0
#> 2 2 Pasta 1
#> 3 3 Pasta, Vegetables 1
#> 4 4 Pulses, Vegetables 0
#> 5 5 Meat, Pasta, Vegetables 1
#> 6 6 Meat, Vegetables 0
#> 7 7 Pulses 0
#> 8 8 Meat 0
Data:
txt <- "ID|Component
1|Vegetables
2|Pasta
3|Pasta, Vegetables
4|Pulses, Vegetables
5|Meat, Pasta, Vegetables
6|Meat, Vegetables
7|Pulses
8|Meat"
df1 <- read.table(text = txt, sep = "|", stringsAsFactors = F, header = T)
CodePudding user response:
You can use
library(dplyr)
df |> mutate(score = as.numeric(grepl("Pasta" , Component , fixed = T)))
- output
ID Component score
1 1 Vegetables 0
2 2 Pasta 1
3 3 Pasta, Vegetables 1
4 4 Pulses, Vegetables 0
5 5 Meat, Pasta, Vegetables 1
6 6 Meat, Vegetables 0
7 7 Pulses 0
8 8 Meat 0
- data
df <- structure(list(ID = 1:8, Component = c("Vegetables", "Pasta",
"Pasta, Vegetables", "Pulses, Vegetables", "Meat, Pasta, Vegetables",
"Meat, Vegetables", "Pulses", "Meat")), class = "data.frame", row.names = c(NA,
-8L))
CodePudding user response:
You can also use str_detect
function together with case_when
function.
library(stringr)
library(dplyr)
df <- data.frame(
ID = seq(1:8),
Component = c("Vegetables",
"Pasta",
"Pasta, Vegetables",
"Pulses, Vegetables",
"Meat, Pasta, Vegetables",
"Meat, Vegetables",
"Pulses",
"Meat")) %>%
mutate(
score = case_when(
str_detect(Component, "Pasta") ~ 1,
T ~ 0
)
)
> df
ID Component score
1 1 Vegetables 0
2 2 Pasta 1
3 3 Pasta, Vegetables 1
4 4 Pulses, Vegetables 0
5 5 Meat, Pasta, Vegetables 1
6 6 Meat, Vegetables 0
7 7 Pulses 0
8 8 Meat 0