Combine three data frames into a long form data frame-CodePudding

I am new to R. I know how to combine different time series data frames into one wide form data frame, but need help with combing them one long form data frame.

Each data frame has hundreds of variables, some of them are the same variables over time, but some do not change over time (e.g., gender), and i want to keep them all in the combined data frame, and hopefully i don't need to specify all the variables in the command--is it possible?

data 1 (data collected at time 1)

ID   var1  var2  var3  job gender 
 1    1     1     1      2    0
 2    2     2     2      3    1
 3    1     3     3      4    0

data 2 (data collected at time 2)

ID   var1  var2  var3  
 1    3     2     3      
 2    3     1     2      
 3    1     3     3      
 
data 3 (data collected at time 3)

ID   var1  var2  var3  happy income 
 1    1     2     1      1      5
 2    2     2     2      3      1
 3    1     3     3      4      9 

Hope the combine_long data frame looks like this:
Need a new variable called "time" here.

ID   time  var1  var2  var3  job gender happy  income 
 1    1     1     1     1      2    0      1       5
 1    2     2     1     2      2    0      1       5
 1    3     1     1     1      2    0      1       5
 2    1     2     2     3      3    1      3       1
 2    2     1     2     4      3    1      3       1
 2    3     2     4     5      3    1      3       1
 3    1     3     3     4      4    0      4       9
 3    2     2     1     3      4    0      4       9
 3    3     1     3     1      4    0      4       9

the var1, var2 and var3 values in the last data frame may not perfectly match the values in the date 1, data 2 and 3 because I am too lazy to match them, but they should be matched.

I already combined them to the wide form data frame which theoretically should be able to transform to be long form with pivot_longer, but I couldn't figure out how to do it either :( so I figured I could combine them to the long form from the very beginning. Please help.

CodePudding user response：

We may use bind_rows on those datasets, with .id specified as 'time' to create the index column, then we fill the rest of the columns with the previous non-NA value per 'ID' (if there are any NA values)

library(dplyr)
library(tidyr)
bind_rows(data1, data2, data3, .id = 'time') %>% 
  arrange(ID) %>%
  group_by(ID)  %>%
  fill(everything(), .direction = 'downup') %>%
  ungroup

-output

# A tibble: 9 × 9
  time     ID  var1  var2  var3   job gender happy income
  <chr> <int> <int> <int> <int> <int>  <int> <int>  <int>
1 1         1     1     1     1     2      0     1      5
2 2         1     3     2     3     2      0     1      5
3 3         1     1     2     1     2      0     1      5
4 1         2     2     2     2     3      1     3      1
5 2         2     3     1     2     3      1     3      1
6 3         2     2     2     2     3      1     3      1
7 1         3     1     3     3     4      0     4      9
8 2         3     1     3     3     4      0     4      9
9 3         3     1     3     3     4      0     4      9

CodePudding user response：

Instead of bind_rows we could use rbindList from data.table. The second part is from @akrun's code. In this case in might make no sense to introduce another package, but it could help to think of rbindList:

library(data.table) 
library(dplyr)
library(tidyr)

rbindlist(list(data1, data2, data3), fill=TRUE) %>% 
  arrange(ID) %>% 
  group_by(ID) %>% 
  fill(job:income, .direction = 'downup') %>% 
  ungroup ()

     ID  var1  var2  var3   job gender happy income
  <int> <int> <int> <int> <int>  <int> <int>  <int>
1     1     1     1     1     2      0     1      5
2     1     3     2     3     2      0     1      5
3     1     1     2     1     2      0     1      5
4     2     2     2     2     3      1     3      1
5     2     3     1     2     3      1     3      1
6     2     2     2     2     3      1     3      1
7     3     1     3     3     4      0     4      9
8     3     1     3     3     4      0     4      9
9     3     1     3     3     4      0     4      9