Home > Software engineering >  How to replace NA in a dataframe for a specific value using the results of another column and taking
How to replace NA in a dataframe for a specific value using the results of another column and taking

Time:03-26

I have a dataframe composed of 9 columns with more than 4000 observations. For this question I will present a simpler dataframe (I use the tidyverse library)

Let's say I have the following dataframe:

library(tidyverse)
df <- tibble(Product = c("Bread","Oranges","Eggs","Bananas","Whole Bread" ),
             Weight = c(NA, 1, NA, NA, NA),
             Units = c(2,6,1,2,1),
             Price = c(1,3.5,0.5,0.75,1.5))
df

I want to replace the NA values of the Weight column for a number multiplied by the results of Units depending on the word showed by the column Product. Basically, is a rule like:

Replace NA in Weight for 2.5*number of units if Product contains the word "Bread". Replace for 1 if Product contains the word "Eggs"

The thing is that I don't know how to code somehting like that in R. I tried the following code that a kind user gave me for a similar question:

df <- df %>%
mutate(Weight = case_when(Product == "bread" & is.na(Weight) ~ 0.25*Units))

But it doesn't work and it doesn't take into account the fact that if there is "Whole Bread" written in my dataframe it also has to apply the rule.

Does anyone have an idea?

CodePudding user response:

Some of them are not exact matches, so use str_detect

library(dplyr)
library(stringr)
df %>% 
   mutate(Weight = case_when(is.na(Weight) & 
     str_detect(Product, regex("Bread", ignore_case = TRUE)) ~ 2.5 * Units, 
     is.na(Weight) & Product == "Eggs"~ Units, TRUE ~ Weight))

-output

# A tibble: 5 × 4
  Product     Weight Units Price
  <chr>        <dbl> <dbl> <dbl>
1 Bread          5       2  1   
2 Oranges        1       6  3.5 
3 Eggs           1       1  0.5 
4 Bananas       NA       2  0.75
5 Whole Bread    2.5     1  1.5 
  • Related