I have a data frame with exam questions organized as you see below in input
.
I'm trying to organize it in a tidy way as displayed in output
.
In input
you can see student's ID
, their answer to the specific item suffix = ".text"
, their score for that specific item suffix = ".score"
, and their total
score.
Input
library(tibble)
input <- tribble(
~ID, ~i1.text, ~i1.score, ~i2.text, ~i2.score, ~total,
"Mark", "A", 0L, "B", 1L, 1L,
"Mary", "C", 1L, "D", 0L, 1L,
"John", "A", 0L, "B", 1L, 1L,
"Abby", "C", 1L, "B", 1L, 2L
)
I need to pivot the data to look like the one below.
I'm quite certain that I can do that with pivot_longer()
but I'm quite lost.
Expected Output
output <- tribble(
~ID, ~item, ~text, ~score, ~total,
"Mark", "i1", "A", 0L, 1L,
"Mark", "i2", "B", 1L, 1L,
"Mary", "i1", "C", 1L, 1L,
"Mary", "i2", "D", 0L, 1L,
"John", "i1", "A", 0L, 1L,
"John", "i2", "B", 1L, 1L,
"Abby", "i1", "C", 1L, 2L,
"Abby", "i2", "B", 2L, 2L
)
output
# A tibble: 8 × 5
ID item text score total
<chr> <chr> <chr> <int> <int>
1 Mark i1 A 0 1
2 Mark i2 B 1 1
3 Mary i1 C 1 1
4 Mary i2 D 0 1
5 John i1 A 0 1
6 John i2 B 1 1
7 Abby i1 C 1 2
8 Abby i2 B 2 2
CodePudding user response:
We can use pivot_longer
with names_sep
as .
- the column 'item' return the prefix part of the column names before the .
and the .value
will return the values of the column with the suffix part of the column name after the .
library(tidyr)
pivot_longer(input, cols = contains("."),
names_to = c("item", ".value"), names_sep = "\\.")
-output
# A tibble: 8 × 5
ID total item text score
<chr> <int> <chr> <chr> <int>
1 Mark 1 i1 A 0
2 Mark 1 i2 B 1
3 Mary 1 i1 C 1
4 Mary 1 i2 D 0
5 John 1 i1 A 0
6 John 1 i2 B 1
7 Abby 2 i1 C 1
8 Abby 2 i2 B 1
CodePudding user response:
Here is an similar but alternative approach using names_pattern
:
Explanation: (.*)\\.(.*)
is regex:
()
captures a group
.
captures any character
*
is a quantifier means 0 or more
\\
Escape character
The regular expression means: any amount of character followed by a dot followed by any amount of character. This regex matches your column names:
library(dplyr)
library(tidyr)
input %>%
pivot_longer(
cols = -c(ID, total),
names_to = c('item', '.value'),
names_pattern = '(.*)\\.(.*)'
)
ID total item text score
<chr> <int> <chr> <chr> <int>
1 Mark 1 i1 A 0
2 Mark 1 i2 B 1
3 Mary 1 i1 C 1
4 Mary 1 i2 D 0
5 John 1 i1 A 0
6 John 1 i2 B 1
7 Abby 2 i1 C 1
8 Abby 2 i2 B 1