I have a dataframe with multiple variables, this is an example:
data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
ID group Var
1 Mickey A 3
2 Goofy A 2
3 Donald A 1
4 Mickey B 4
5 Donald C 5
6 Pluto C 2
I want a new dataframe in which all the IDs appear in each group, where the absent ID has Var = 0
like this:
ID group Var
1 Mickey A 3
2 Goofy A 2
3 Donald A 1
4 Pluto A 0
5 Mickey B 4
6 Goofy B 0
7 Donald B 0
8 Pluto B 0
9 Mickey C 0
10 Goofy C 0
11 Donald C 5
12 Pluto C 2
I tried using join_lef and merge as:
a=unique(df1$ID)
df2 <- df1 %>%
group_by(group)%>%
join_left(a)
but they both do not work in this way
CodePudding user response:
Using complete()
from tidyr
:
library(tidyr)
df %>%
complete(group, ID, fill = list(Var = 0))
# A tibble: 12 × 3
group ID Var
<chr> <chr> <dbl>
1 A Donald 1
2 A Goofy 2
3 A Mickey 3
4 A Pluto 0
5 B Donald 0
6 B Goofy 0
7 B Mickey 4
8 B Pluto 0
9 C Donald 5
10 C Goofy 0
11 C Mickey 0
12 C Pluto 2
A base
solution:
transform(merge(expand.grid(lapply(df[2:1], unique)), df, all.x = TRUE, sort = TRUE),
Var = replace(Var, is.na(Var), 0))
which has the same output as complete()
except for order of rows.
CodePudding user response:
Here is my solution using mostly base R
DF_raw<-data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
groups<-unique(DF_raw$group)
IDs<-unique(DF_raw$ID)
DF_clean<-dplyr::bind_rows(lapply(IDs, function(ID){data.frame(ID=ID,group=groups)})) #create the possibilies
DF_clean$Var<-sapply(1:nrow(DF_clean), function(ROW){
OUT<-DF_raw$Var[which(
DF_raw$ID==DF_clean$ID[ROW]&
DF_raw$group==DF_clean$group[ROW]
)] #finds the Var if it exists
if(length(OUT)==0){
OUT<-0 #new ID
}
OUT
})
print(DF_clean)