I am trying to concatenate strings for specific groups of customer ids/order dates. I have a dataframe:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, orderid, status)
I am trying to group by customer id and order date. Then for each group, I want all but 1 "review" changed to "duplicate" and all the orderids concatenated by customerid/orderdate. The result would be:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1,2", "1,2", "3,4", "3,4", "5,6,7", "5,6,7", "5,6,7", "8")
status <- c("review", "duplicate", "review", "negative", "positive", "review", "duplicate", "review")
df <- data.frame(customerid, orderdate, orderid, status)
Ty!
CodePudding user response:
You may try
library(dplyr)
df %>%
group_by(customerid, orderdate) %>%
mutate(orderid = paste0(orderid, collapse = ","),
status = ifelse(row_number() == 1, status, "duplicate")
)
customerid orderdate orderid status
<chr> <chr> <chr> <chr>
1 A1 2018-09-14 1,2 review
2 A1 2018-09-14 1,2 duplicate
3 A2 2018-09-15 3,4 review
4 A2 2018-09-15 3,4 duplicate
5 A3 2020-08-21 5,6,7 review
6 A3 2020-08-21 5,6,7 duplicate
7 A3 2020-08-21 5,6,7 duplicate
8 A4 2018-08-10 8 review
I'm not sure about exact conditions for negative
/positive
but
df %>%
group_by(customerid, orderdate) %>%
mutate(orderid = paste0(orderid, collapse = ",")) %>%
group_by(customerid, orderdate, status) %>%
mutate(status = ifelse((row_number() != 1) & (status == "review"), "duplicate", status))
customerid orderdate orderid status
<chr> <chr> <chr> <chr>
1 A1 2018-09-14 1,2 review
2 A1 2018-09-14 1,2 duplicate
3 A2 2018-09-15 3,4 review
4 A2 2018-09-15 3,4 negative
5 A3 2020-08-21 5,6,7 positive
6 A3 2020-08-21 5,6,7 review
7 A3 2020-08-21 5,6,7 duplicate
8 A4 2018-08-10 8 review
CodePudding user response:
You can try
library(dplyr)
df %>%
group_by(customerid, orderdate) %>%
mutate(orderid = toString(orderid),
status = ifelse(status == "review" & duplicated(status), "duplicate", status)) %>%
ungroup()
# A tibble: 8 × 4
customerid orderdate orderid status
<chr> <chr> <chr> <chr>
1 A1 2018-09-14 1, 2 review
2 A1 2018-09-14 1, 2 duplicate
3 A2 2018-09-15 3, 4 review
4 A2 2018-09-15 3, 4 negative
5 A3 2020-08-21 5, 6, 7 positive
6 A3 2020-08-21 5, 6, 7 review
7 A3 2020-08-21 5, 6, 7 duplicate
8 A4 2018-08-10 8 review