Home > Back-end >  R: How to recode a wide data frame with repeating variables (time) into to long data?
R: How to recode a wide data frame with repeating variables (time) into to long data?

Time:05-27

My data frame consists of several hundred persons (rows) answering 4 * 90 items each. It’s the same 90 questions across four points in time per person. Each person belongs to one of two groups. I checked for measurement invariance across these groups, this is easy with multigroup SEM and data in the wide format. Here is a shortened, reproducible example of my initial data frame:

df <- data.frame (ID = c(1,2,3,4,5),
                  Item1_time1 = c(4,4,5,4,3),
                  Item2_time1 = c(3,4,3,5,3),
                  Item1_time2 = c(5,4,4,5,4),
                  Item2_time2 = c(3,3,4,3,5),
                  group = c(0,1,0,1,0)
)

print(df)
  ID Item1_time1 Item2_time1 Item1_time2 Item2_time2 group
1  1           4           3           5           3     0
2  2           4           4           4           3     1
3  3           5           3           4           4     0
4  4           4           5           5           3     1
5  5           3           3           4           5     0

However, now I want to check for measurement invariance across points in time. To do this, this is what the desired output should look like:

ID  time   Item1   Item2    group
1    1       4       3        0
1    2       5       3        0
2    1       4       4        1
2    2       4       3        1
3    1       5       3        0
3    2       4       4        0
4    1       4       5        1
4    2       5       3        1
5    1       3       3        0
5    2       4       5        0

In order to get this, I tried to convert my initial wide to long data, but I don’t really get the desired output:

library(data.table)
long <- melt(setDT(df), id.vars = c("ID"), variable.name = "time")
print(long)

    ID        time value
 1:  1 Item1_time1     4
 2:  2 Item1_time1     4
 3:  3 Item1_time1     5
 4:  4 Item1_time1     4
 5:  5 Item1_time1     3
 6:  1 Item2_time1     3
 7:  2 Item2_time1     4
 8:  3 Item2_time1     3
 9:  4 Item2_time1     5
10:  5 Item2_time1     3
11:  1 Item1_time2     5
12:  2 Item1_time2     4
13:  3 Item1_time2     4
...

How can I code “collapsed” items so that they are not listed separately, but each in its corresponding time category (see desired output above)?

CodePudding user response:

We can use pivot_longer with names_pattern to capture the substring ((...)) from the column names to create the .value column and the 'time' column

library(tidyr)
pivot_longer(df, cols = -c(ID, group), names_to = c(".value", "time") , 
      names_pattern = "(\\w )_\\D (\\d )")

-output

# A tibble: 10 × 5
      ID group time  Item1 Item2
   <dbl> <dbl> <chr> <dbl> <dbl>
 1     1     0 1         4     3
 2     1     0 2         5     3
 3     2     1 1         4     4
 4     2     1 2         4     3
 5     3     0 1         5     3
 6     3     0 2         4     4
 7     4     1 1         4     5
 8     4     1 2         5     3
 9     5     0 1         3     3
10     5     0 2         4     5

The (\\w ) - captures one or more characters followed by the _ in the column name, and then any non-digits (\\D ) followed by the second capture group of one or more digits ((\\d )), which corresponds to the .value (column values) and 'time' gets the digits suffix from the column name

CodePudding user response:

Like this?

setDT(df)
result <- melt(df, measure.vars=patterns(Item1='Item1', Item2='Item2')   
                 , variable.name = 'time')
setorder(result, ID, time, group)
result     
##     ID group time Item1 Item2
##  1:  1     0    1     4     3
##  2:  1     0    2     5     3
##  3:  2     1    1     4     4
##  4:  2     1    2     4     3
##  5:  3     0    1     5     3
##  6:  3     0    2     4     4
##  7:  4     1    1     4     5
##  8:  4     1    2     5     3
##  9:  5     0    1     3     3
## 10:  5     0    2     4     5
  • Related