Home > Back-end >  Q About Converting the Format of Data Frame in R
Q About Converting the Format of Data Frame in R

Time:06-03

May I know how to convert the format of this data frame? This is a participant who took three tests (A, B, C) two times (0,2) on two words (Word_id: 201, 202), with the scores on each time coded as 0 or 1.

I would like to covert my data frame like this, with "Time" occurring as "0, 0,0, 2, 2, 2".

Participant Time    Measure Word_ID Score
100 0   A   201 0
100 0   B   201 1
100 0   C   201 0
100 2   A   201 1
100 2   B   201 1
100 2   C   201 1
100 0   A   202 0
100 0   B   202 0
100 0   C   202 0
100 2   A   202 1
100 2   B   202 1
100 2   C   202 1
                

But my current data frame looks like this. May I have your suggestions? Thank you very much.

    Participant Time    Measure 201 202
    100 0   A   0   0
    100 0   B   1   0
    100 0   C   0   0
    100 2   A   1   1
    100 2   B   1   1
    100 2   C   1   1

CodePudding user response:

Reading your data as df like

df <- read.table(text = "    Participant Time    Measure 201 202
    100 0   A   0   0
    100 0   B   1   0
    100 0   C   0   0
    100 2   A   1   1
    100 2   B   1   1
    100 2   C   1   1", header = T)

In this case, column name 201 and 202 become X201 and X202.

library(dplyr)
library(stringr)
library(reshape2)

df %>%
  reshape2::melt(id = c('Participant', 'Time', 'Measure'), 
                 variable.name = "Word_ID",
                 value.name = "Score") %>%
  mutate(Word_ID = str_remove(Word_ID, "X"))

   Participant Time Measure Word_ID Score
1          100    0       A     201     0
2          100    0       B     201     1
3          100    0       C     201     0
4          100    2       A     201     1
5          100    2       B     201     1
6          100    2       C     201     1
7          100    0       A     202     0
8          100    0       B     202     0
9          100    0       C     202     0
10         100    2       A     202     1
11         100    2       B     202     1
12         100    2       C     202     1

CodePudding user response:

You can use pivot_longer from tidyr:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(`201`:`202`, names_to = "Word_ID", values_to = "Score") %>% 
  arrange(Participant, Word_ID)

Output

   Participant  Time Measure Word_ID Score
         <int> <int> <chr>   <chr>   <int>
 1         100     0 A       201         0
 2         100     0 B       201         1
 3         100     0 C       201         0
 4         100     2 A       201         1
 5         100     2 B       201         1
 6         100     2 C       201         1
 7         100     0 A       202         0
 8         100     0 B       202         0
 9         100     0 C       202         0
10         100     2 A       202         1
11         100     2 B       202         1
12         100     2 C       202         1

Data

df <- structure(list(Participant = c(100L, 100L, 100L, 100L, 100L, 
100L), Time = c(0L, 0L, 0L, 2L, 2L, 2L), Measure = c("A", "B", 
"C", "A", "B", "C"), `201` = c(0L, 1L, 0L, 1L, 1L, 1L), `202` = c(0L, 
0L, 0L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-6L))
  • Related