grouping by column variables and appending a new variable based on condition-CodePudding

Say I have a dataframe-

Column A	Column B
id1	blue
id1	red
id1	grey
id2	red
id3	red
id3	grey

I would like this output-

Column A	Column B
id1	all.mixed
id2	red
id3	red.grey

I tried this- table1 <- mydf %>% group_by(ColA, ColB) %>% count(ColB) and came to this-

ColA ColB                n
  <chr>   <chr>           <int>
1 id1    blue              1
2 id1    red        1
3 id1   grey             1
4 id2   red       1
5 id3   red     1
6 id3  grey1     1

But I am kind of lost after this. I thought of group_by and summing up the rows in the colB, but then if I have a situation such that-

Column A	Column B
id5	grey
id5	grey

Then what do i do?

CodePudding user response：

It's not clear how you are determining the values for column B in the output (e.g. "all.mixed"), but let's assume there is a calcMagicValue function. The general flow would be something like this:

calcMagicValue <- function(vals){
  
  # psuedo code
  if(all(possibleVals %in% vals)){
    "all.mixed"
  } else if( hasRedAndGrey){
    "red.grey"
  } else if(onlyRed) {
    "red"
  } else {
    #...other conditions...
  }
}

myDf |> 
  group_by(ColA) |> 
  summarize(newColB = calcMagicValue(ColB))

After grouping by, summarize will iterate through the unique values in the grouped columns, and pass the matching rows the functions called (in this case calcMagicValue). In this case, your result will then have one row for each unique combination/values in the grouped columns(s).

CodePudding user response：

A slightly different approach using group_by summarise which uses an indicator to replace the all.mixed cases in an additional mutate step:

library(dplyr)

dat %>%
  group_by(Column.A) |> 
  summarise(all.mixed = all(unique(dat$Column.B) %in% Column.B), 
            Column.B = paste(Column.B, collapse = "."), .groups = "drop") |> 
  mutate(Column.B = if_else(all.mixed, "all.mixed", Column.B)) |> 
  select(-all.mixed)
#> # A tibble: 3 × 2
#>   Column.A Column.B 
#>   <chr>    <chr>    
#> 1 id1      all.mixed
#> 2 id2      red      
#> 3 id3      red.grey

DATA

dat <- data.frame(
          Column.A = c("id1", "id1", "id1", "id2", "id3", "id3"),
          Column.B = c("blue", "red", "grey", "red", "red", "grey")
)