Home > Software engineering >  R - how to replace strings in a column based on 2 or more values
R - how to replace strings in a column based on 2 or more values

Time:12-28

I am trying to replace a string for specific groups of customer ids/order dates. The example below may better illustrate my question. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, status)

I am trying to group by customerid and orderdate. Then in each group of customerid/orderddate, if there is a "positive" or "negative" in the status column, I want to replace "review" with "done". The replacement can be done in a new column or the same one. The result would be:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
new <- c("review", "review", "done", "negative", "positive", "done", "done", "review")
df <- data.frame(customerid, orderdate, status, new)

I tried using the dplyr package and group_by and filter functions but am not sure how to proceed from here.

df <- df %>% 
       group_by(customerid, orderdate) %>% 
       dplyr::filter(df$new == 'review' & df$new == 'positive') %>%   

Thank you!

CodePudding user response:

If we want to replace the review to done only when there is a positve or negative value in status for that group, then after grouping, create a logical vector that checks whether there are any 'positive' or 'negative' %in% the 'status' and (&) the value is 'review', then replace with 'done' or else return the column values

library(dplyr)
df <- df %>% 
      group_by(customerid, orderdate) %>%
      mutate(new = case_when(any(c('positive', 'negative') %in% 
             status) & status == 'review' ~ 'done', TRUE ~ status)) %>% 
     ungroup

-output

df
# A tibble: 8 × 4
  customerid orderdate  status   new     
  <chr>      <chr>      <chr>    <chr>   
1 A1         2018-09-14 review   review  
2 A1         2018-09-14 review   review  
3 A2         2018-09-15 review   done    
4 A2         2018-09-15 negative negative
5 A3         2020-08-21 positive positive
6 A3         2020-08-21 review   done    
7 A3         2020-08-21 review   done    
8 A4         2018-08-10 review   review  
  • Related