I have a dataframe
composed of 9 columns with more than 4000 observations. For this question I will present a simpler dataframe
(I use the tidyverse
library)
Let's say I have the following dataframe
:
library(tidyverse)
df <- tibble(Product = c("Bread","Oranges","Eggs","Bananas","Whole Bread" ),
Weight = c(NA, 1, NA, NA, NA),
Units = c(2,6,1,2,1),
Price = c(1,3.5,0.5,0.75,1.5))
df
I want to replace the NA
values of the Weight
column for a number multiplied by the results of Units
depending on the word showed by the column Product
. Basically, is a rule like:
Replace NA in Weight for 2.5*number of units if Product contains the word "Bread". Replace for 1 if Product contains the word "Eggs"
The thing is that I don't know how to code somehting like that in R
. I tried the following code that a kind user gave me for a similar question:
df <- df %>%
mutate(Weight = case_when(Product == "bread" & is.na(Weight) ~ 0.25*Units))
But it doesn't work and it doesn't take into account the fact that if there is "Whole Bread"
written in my dataframe
it also has to apply the rule.
Does anyone have an idea?
CodePudding user response:
Some of them are not exact matches, so use str_detect
library(dplyr)
library(stringr)
df %>%
mutate(Weight = case_when(is.na(Weight) &
str_detect(Product, regex("Bread", ignore_case = TRUE)) ~ 2.5 * Units,
is.na(Weight) & Product == "Eggs"~ Units, TRUE ~ Weight))
-output
# A tibble: 5 × 4
Product Weight Units Price
<chr> <dbl> <dbl> <dbl>
1 Bread 5 2 1
2 Oranges 1 6 3.5
3 Eggs 1 1 0.5
4 Bananas NA 2 0.75
5 Whole Bread 2.5 1 1.5