R: Fill/complete each group of dataframe with vector-CodePudding

I have a dataframe with multiple variables, this is an example:

data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
           group = c("A", "A", "A", "B", "C", "C"),
           Var = c(3, 2, 1, 4, 5, 2))

      ID group Var
1 Mickey     A   3
2  Goofy     A   2
3 Donald     A   1
4 Mickey     B   4
5 Donald     C   5
6  Pluto     C   2

I want a new dataframe in which all the IDs appear in each group, where the absent ID has Var = 0

like this:

       ID group Var
1  Mickey     A   3
2   Goofy     A   2
3  Donald     A   1
4   Pluto     A   0
5  Mickey     B   4
6   Goofy     B   0
7  Donald     B   0
8   Pluto     B   0
9  Mickey     C   0
10  Goofy     C   0
11 Donald     C   5
12  Pluto     C   2

I tried using join_lef and merge as:

a=unique(df1$ID)
df2 <- df1 %>%
  group_by(group)%>%
  join_left(a)

but they both do not work in this way

CodePudding user response：

Using complete() from tidyr:

library(tidyr)

df %>%
  complete(group, ID, fill = list(Var = 0))

# A tibble: 12 × 3
   group ID       Var
   <chr> <chr>  <dbl>
 1 A     Donald     1
 2 A     Goofy      2
 3 A     Mickey     3
 4 A     Pluto      0
 5 B     Donald     0
 6 B     Goofy      0
 7 B     Mickey     4
 8 B     Pluto      0
 9 C     Donald     5
10 C     Goofy      0
11 C     Mickey     0
12 C     Pluto      2

A base solution:

transform(merge(expand.grid(lapply(df[2:1], unique)), df, all.x = TRUE, sort = TRUE),
          Var = replace(Var, is.na(Var), 0))

which has the same output as complete() except for order of rows.

CodePudding user response：

Here is my solution using mostly base R

DF_raw<-data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
           group = c("A", "A", "A", "B", "C", "C"),
           Var = c(3, 2, 1, 4, 5, 2))

groups<-unique(DF_raw$group)
IDs<-unique(DF_raw$ID)

DF_clean<-dplyr::bind_rows(lapply(IDs, function(ID){data.frame(ID=ID,group=groups)})) #create the possibilies

DF_clean$Var<-sapply(1:nrow(DF_clean), function(ROW){
  OUT<-DF_raw$Var[which(
    DF_raw$ID==DF_clean$ID[ROW]& 
      DF_raw$group==DF_clean$group[ROW]
  )] #finds the Var if it exists
  if(length(OUT)==0){
    OUT<-0 #new ID 
  }
  OUT
  })

print(DF_clean)