I am having a question about pivoting a dataframe using a combination (categories) from 2 variables.
I am having the following dataframe:
df <- data.frame(id = (c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5)),
genes = c(4,4,4,5,5,5,4,4,4,5,4,4,5,5,5,4,4,4,5,4,4,4,5,5,5),
proteins = c(1,2,3,1,2,3,1,2,3,1,1,2,1,2,3,1,2,3,2,1,2,3,1,2,3),
values =c(1,4,5,6,4,10,5,4,6,13,14,54,34,67,45,1,3,5,7,5,12,5,6,44,3))
genes and protein variables represent different combinations (categories) within repeated measures of the same person. For example, the first measurement of id 1 gave the combination of gene "4" and protein "1", the second measurement of the same id gave the combination of gene "4" and protein "2" and so on. There are in total 6 combinations in the variables genes and protein (i.e. 4 & 1, 4 & 2, 4 & 3, 5 & 1, 5 & 2 and 5 & 3), with some of the ids having not all of them as you can see.
What I want is to pivot_wider() that dataframe by making these 6 combinations as columns group_by() the id. That means that each person will have only one row of data, 6 columns of these categories (4 & 1, 4 & 2, 4 & 3, 5 & 1, 5 & 2 and 5 & 3) and "values" variable will go under each corresponding combination / category.
What I would like to get as an output in the dataframe is the following: Were gene_pro4_1, gene_pro4_2 and so on are the combined categories of the columns genes and proteins
ID gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
1 1 1 4 5 6 4 10
2 2 5 4 6 13 NA NA
3 3 14 54 NA 34 67 45
4 4 1 3 5 NA 7 NA
5 5 5 12 5 6 44 3
Thank you very much for any help.
CodePudding user response:
Here is a way -
tidyr::pivot_wider(df,
names_from = c(genes, proteins),
values_from = values,
names_prefix = 'gen_pro')
# id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 4 5 6 4 10
#2 2 5 4 6 13 NA NA
#3 3 14 54 NA 34 67 45
#4 4 1 3 5 NA 7 NA
#5 5 5 12 5 6 44 3
CodePudding user response:
This answer, although very similar to others (posted just a few seconds before), shows the use of names_glue
to compose versatile name combinations using string interpolation.
df |>
pivot_wider(id_cols = id,
names_from = c(genes,proteins),
names_glue = "gen_pro{genes}_{proteins}",
values_from = values)
# A tibble: 5 × 7
id gen_pro4_1 gen_pro4_2 gen_pro4_3 gen_pro5_1 gen_pro5_2 gen_pro5_3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 4 5 6 4 10
2 2 5 4 6 13 NA NA
3 3 14 54 NA 34 67 45
4 4 1 3 5 NA 7 NA
5 5 5 12 5 6 44 3