I am struggling in transforming a long dataframe to wide version, but with some complications.
I have a column, ID
, with duplicated entry - each refers to a timepoint for an individual. I have also other columns (namely, visit
, var1
and var2
), in which data regarding each timepoint are reported. Here is a reproducible example:
df <- data.frame(ID=c(1,1,1,1,2,2,2,3,3,3),
visit=c(1,4,5,7,1,3,4,2,5,6),
var1=c("AF","no","no","no","AG","AG","no","BA","BA","BA"),
var2=c("good","good","good","bad","good","good","bad","good","good","good"))
And the output:
ID visit var1 var2
1 1 1 AF good
2 1 4 no good
3 1 5 no good
4 1 7 no bad
5 2 1 AG good
6 2 3 AG good
7 2 4 no bad
8 3 2 BA good
9 3 5 BA good
10 3 6 BA good
I really need to end up with a dataframe which contains only one row per ID
, and multiple columns for each of the other variables, with a numeric suffix for example (such as visit_1
, visit_2
, visit_3
etc.).
The output I have in mind is like this:
ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
1 1 1 4 5 7 AF no no no good good good bad
2 2 1 3 4 NA AG AG no <NA> good good bad <NA>
3 3 2 5 6 NA BA BA BA <NA> good good good <NA>.
in which essentially each entry of the columns visit
, var1
and var2
has been placed in a separate, sequential column based on the ID
column.
I have tried data.frame::dcast
and tidyr::spread
, and also pivot_wider()
but it seems like these formula will end up with multiple column based on the actual values rather than producing fixed columns instead. For example, with pivot_wider():
df %>% pivot_wider(names_from = ID, values_from = c("visit","var1","var2"))
It returns me an error saying
Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Anyone can help?
CodePudding user response:
You have to add a sequential ID for each group:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(count = row_number()) %>%
pivot_wider(ID, names_from = count, values_from = c(visit, var1, var2))
# A tibble: 3 x 13
# Groups: ID [3]
# ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
# <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 1 4 5 7 AF no no no good good good bad
#2 2 1 3 4 NA AG AG no NA good good bad NA
#3 3 2 5 6 NA BA BA BA NA good good good NA