Home > Software engineering >  Concatenate strings based on value in another column
Concatenate strings based on value in another column

Time:12-28

I am trying to concatenate strings for specific groups of customer ids/order dates. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, orderid, status)

I am trying to group by customer id and order date. Then for each group, I want all but 1 "review" changed to "duplicate" and all the orderids concatenated by customerid/orderdate. The result would be:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
orderid <- c("1,2", "1,2", "3,4", "3,4", "5,6,7", "5,6,7", "5,6,7", "8")
status <- c("review", "duplicate", "review", "negative", "positive", "review", "duplicate", "review")
df <- data.frame(customerid, orderdate, orderid, status)

Ty!

CodePudding user response:

You may try

library(dplyr)
df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = paste0(orderid, collapse = ","),
         status = ifelse(row_number() == 1, status, "duplicate")
         ) 

  customerid orderdate  orderid status   
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1,2     review   
2 A1         2018-09-14 1,2     duplicate
3 A2         2018-09-15 3,4     review   
4 A2         2018-09-15 3,4     duplicate
5 A3         2020-08-21 5,6,7   review   
6 A3         2020-08-21 5,6,7   duplicate
7 A3         2020-08-21 5,6,7   duplicate
8 A4         2018-08-10 8       review 

I'm not sure about exact conditions for negative/positive but

df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = paste0(orderid, collapse = ",")) %>%
  group_by(customerid, orderdate, status) %>%
  mutate(status = ifelse((row_number() != 1) & (status == "review"), "duplicate", status))

  customerid orderdate  orderid status   
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1,2     review   
2 A1         2018-09-14 1,2     duplicate
3 A2         2018-09-15 3,4     review   
4 A2         2018-09-15 3,4     negative 
5 A3         2020-08-21 5,6,7   positive 
6 A3         2020-08-21 5,6,7   review   
7 A3         2020-08-21 5,6,7   duplicate
8 A4         2018-08-10 8       review  

CodePudding user response:

You can try

library(dplyr)

df %>%
  group_by(customerid, orderdate) %>%
  mutate(orderid = toString(orderid),
         status = ifelse(status == "review" & duplicated(status), "duplicate", status)) %>%
  ungroup()

# A tibble: 8 × 4
  customerid orderdate  orderid status
  <chr>      <chr>      <chr>   <chr>    
1 A1         2018-09-14 1, 2    review
2 A1         2018-09-14 1, 2    duplicate
3 A2         2018-09-15 3, 4    review
4 A2         2018-09-15 3, 4    negative
5 A3         2020-08-21 5, 6, 7 positive
6 A3         2020-08-21 5, 6, 7 review
7 A3         2020-08-21 5, 6, 7 duplicate
8 A4         2018-08-10 8       review
  • Related