My data frame consists of several hundred persons (rows) answering 4 * 90 items each. It’s the same 90 questions across four points in time per person. Each person belongs to one of two groups. I checked for measurement invariance across these groups, this is easy with multigroup SEM and data in the wide format. Here is a shortened, reproducible example of my initial data frame:
df <- data.frame (ID = c(1,2,3,4,5),
Item1_time1 = c(4,4,5,4,3),
Item2_time1 = c(3,4,3,5,3),
Item1_time2 = c(5,4,4,5,4),
Item2_time2 = c(3,3,4,3,5),
group = c(0,1,0,1,0)
)
print(df)
ID Item1_time1 Item2_time1 Item1_time2 Item2_time2 group
1 1 4 3 5 3 0
2 2 4 4 4 3 1
3 3 5 3 4 4 0
4 4 4 5 5 3 1
5 5 3 3 4 5 0
However, now I want to check for measurement invariance across points in time. To do this, this is what the desired output should look like:
ID time Item1 Item2 group
1 1 4 3 0
1 2 5 3 0
2 1 4 4 1
2 2 4 3 1
3 1 5 3 0
3 2 4 4 0
4 1 4 5 1
4 2 5 3 1
5 1 3 3 0
5 2 4 5 0
In order to get this, I tried to convert my initial wide to long data, but I don’t really get the desired output:
library(data.table)
long <- melt(setDT(df), id.vars = c("ID"), variable.name = "time")
print(long)
ID time value
1: 1 Item1_time1 4
2: 2 Item1_time1 4
3: 3 Item1_time1 5
4: 4 Item1_time1 4
5: 5 Item1_time1 3
6: 1 Item2_time1 3
7: 2 Item2_time1 4
8: 3 Item2_time1 3
9: 4 Item2_time1 5
10: 5 Item2_time1 3
11: 1 Item1_time2 5
12: 2 Item1_time2 4
13: 3 Item1_time2 4
...
How can I code “collapsed” items so that they are not listed separately, but each in its corresponding time category (see desired output above)?
CodePudding user response:
We can use pivot_longer
with names_pattern
to capture the substring ((...)
) from the column names to create the .value
column and the 'time' column
library(tidyr)
pivot_longer(df, cols = -c(ID, group), names_to = c(".value", "time") ,
names_pattern = "(\\w )_\\D (\\d )")
-output
# A tibble: 10 × 5
ID group time Item1 Item2
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 0 1 4 3
2 1 0 2 5 3
3 2 1 1 4 4
4 2 1 2 4 3
5 3 0 1 5 3
6 3 0 2 4 4
7 4 1 1 4 5
8 4 1 2 5 3
9 5 0 1 3 3
10 5 0 2 4 5
The (\\w )
- captures one or more characters followed by the _
in the column name, and then any non-digits (\\D
) followed by the second capture group of one or more digits ((\\d )
), which corresponds to the .value
(column values) and 'time' gets the digits suffix from the column name
CodePudding user response:
Like this?
setDT(df)
result <- melt(df, measure.vars=patterns(Item1='Item1', Item2='Item2')
, variable.name = 'time')
setorder(result, ID, time, group)
result
## ID group time Item1 Item2
## 1: 1 0 1 4 3
## 2: 1 0 2 5 3
## 3: 2 1 1 4 4
## 4: 2 1 2 4 3
## 5: 3 0 1 5 3
## 6: 3 0 2 4 4
## 7: 4 1 1 4 5
## 8: 4 1 2 5 3
## 9: 5 0 1 3 3
## 10: 5 0 2 4 5