I am trying to replace a string for specific groups of customer ids/order dates. The example below may better illustrate my question. I have a dataframe:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, status)
I am trying to group by customerid and orderdate. Then in each group of customerid/orderddate, if there is a "positive" or "negative" in the status column, I want to replace "review" with "done". The replacement can be done in a new column or the same one. The result would be:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
new <- c("review", "review", "done", "negative", "positive", "done", "done", "review")
df <- data.frame(customerid, orderdate, status, new)
I tried using the dplyr package and group_by and filter functions but am not sure how to proceed from here.
df <- df %>%
group_by(customerid, orderdate) %>%
dplyr::filter(df$new == 'review' & df$new == 'positive') %>%
Thank you!
CodePudding user response:
If we want to replace the review
to done
only when there is a positve
or negative
value in status
for that group, then after grouping, create a logical vector that checks whether there are any
'positive' or 'negative' %in%
the 'status' and (&
) the value is 'review', then replace with 'done' or else return the column values
library(dplyr)
df <- df %>%
group_by(customerid, orderdate) %>%
mutate(new = case_when(any(c('positive', 'negative') %in%
status) & status == 'review' ~ 'done', TRUE ~ status)) %>%
ungroup
-output
df
# A tibble: 8 × 4
customerid orderdate status new
<chr> <chr> <chr> <chr>
1 A1 2018-09-14 review review
2 A1 2018-09-14 review review
3 A2 2018-09-15 review done
4 A2 2018-09-15 negative negative
5 A3 2020-08-21 positive positive
6 A3 2020-08-21 review done
7 A3 2020-08-21 review done
8 A4 2018-08-10 review review