I am new to R. I know how to combine different time series data frames into one wide form data frame, but need help with combing them one long form data frame.
Each data frame has hundreds of variables, some of them are the same variables over time, but some do not change over time (e.g., gender), and i want to keep them all in the combined data frame, and hopefully i don't need to specify all the variables in the command--is it possible?
data 1 (data collected at time 1)
ID var1 var2 var3 job gender
1 1 1 1 2 0
2 2 2 2 3 1
3 1 3 3 4 0
data 2 (data collected at time 2)
ID var1 var2 var3
1 3 2 3
2 3 1 2
3 1 3 3
data 3 (data collected at time 3)
ID var1 var2 var3 happy income
1 1 2 1 1 5
2 2 2 2 3 1
3 1 3 3 4 9
Hope the combine_long data frame looks like this:
Need a new variable called "time" here.
ID time var1 var2 var3 job gender happy income
1 1 1 1 1 2 0 1 5
1 2 2 1 2 2 0 1 5
1 3 1 1 1 2 0 1 5
2 1 2 2 3 3 1 3 1
2 2 1 2 4 3 1 3 1
2 3 2 4 5 3 1 3 1
3 1 3 3 4 4 0 4 9
3 2 2 1 3 4 0 4 9
3 3 1 3 1 4 0 4 9
the var1, var2 and var3 values in the last data frame may not perfectly match the values in the date 1, data 2 and 3 because I am too lazy to match them, but they should be matched.
I already combined them to the wide form data frame which theoretically should be able to transform to be long form with pivot_longer, but I couldn't figure out how to do it either :( so I figured I could combine them to the long form from the very beginning. Please help.
CodePudding user response:
We may use bind_rows
on those datasets, with .id
specified as 'time' to create the index column, then we fill
the rest of the columns with the previous non-NA value per 'ID' (if there are any NA
values)
library(dplyr)
library(tidyr)
bind_rows(data1, data2, data3, .id = 'time') %>%
arrange(ID) %>%
group_by(ID) %>%
fill(everything(), .direction = 'downup') %>%
ungroup
-output
# A tibble: 9 × 9
time ID var1 var2 var3 job gender happy income
<chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 1 2 0 1 5
2 2 1 3 2 3 2 0 1 5
3 3 1 1 2 1 2 0 1 5
4 1 2 2 2 2 3 1 3 1
5 2 2 3 1 2 3 1 3 1
6 3 2 2 2 2 3 1 3 1
7 1 3 1 3 3 4 0 4 9
8 2 3 1 3 3 4 0 4 9
9 3 3 1 3 3 4 0 4 9
CodePudding user response:
Instead of bind_rows
we could use rbindList
from data.table
.
The second part is from @akrun's code. In this case in might make no sense to introduce another package, but it could help to think of rbindList
:
library(data.table)
library(dplyr)
library(tidyr)
rbindlist(list(data1, data2, data3), fill=TRUE) %>%
arrange(ID) %>%
group_by(ID) %>%
fill(job:income, .direction = 'downup') %>%
ungroup ()
ID var1 var2 var3 job gender happy income
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 2 0 1 5
2 1 3 2 3 2 0 1 5
3 1 1 2 1 2 0 1 5
4 2 2 2 2 3 1 3 1
5 2 3 1 2 3 1 3 1
6 2 2 2 2 3 1 3 1
7 3 1 3 3 4 0 4 9
8 3 1 3 3 4 0 4 9
9 3 1 3 3 4 0 4 9