I have grouped data like follows:
group id var
1 1 1 1
2 1 2 0
3 1 3 1
4 2 1 0
5 2 2 1
6 3 1 0
I'd like to create N new var
columns (var1
, var2
,...) where N is the max number of individuals in a group (in this case N = 3 as largest group is group 1
which has three id
s.
var1
should contain the var
value for id = 1
in each group; likewise var2
should contain the var
value for each id = 2
in each group. Where the number of columns exceeds the number in a group it should return NA
.
The sought output for the above example would be:
group id var var1 var2 var3
1 1 1 1 1 0 1
2 1 2 0 1 0 1
3 1 3 1 1 0 1
4 2 1 0 0 1 NA
5 2 2 1 0 1 NA
6 3 1 0 0 NA NA
Example Code
a <- data.frame(group = c(1,1,1,2,2,3),
id = c(1,2,3,1,2,1),
var = c(1,0,1,0,1,0))
CodePudding user response:
Looks like you want to pivot the dataframe wider: you can use pivot_wider()
. If you want to keeps the original rows, you can also use right_join()
.
library(tidyverse)
a %>% right_join(
pivot_wider(a,values_from = var, names_from = id, names_prefix = "var"),
by='group'
)
Output:
group id var var1 var2 var3
1 1 1 1 1 0 1
2 1 2 0 1 0 1
3 1 3 1 1 0 1
4 2 1 0 0 1 NA
5 2 2 1 0 1 NA
6 3 1 0 0 NA NA
However, maybe is this a bit redundant? Just using pivot_wider()
keeps the same info because varx
are the original id
column:
library(tidyverse)
a %>%
pivot_wider(values_from = var, names_from = id, names_prefix = "var")
# A tibble: 3 x 4
group var1 var2 var3
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 1
2 2 0 1 NA
3 3 0 NA NA