There are quite a few questions already regarding the use of pivot longer and pivot wider to reshape data in the way that I need, but none quite get the answer right. I've successfully used a combination of pivot longer and pivot wider in the past to get the output that I need with differently shaped data, but the approach isn't quite working here.
library(tidyverse)
sampleData <- tribble(
~ID, ~ExamCode_01, ~ExamGrade_01, ~AdminYear_01, ~ExamCode_02, ~ExamGrade_02, ~AdminYear_02,
123, 4, 4, 22, 26, 5, 22,
456, 26, 3, 22, 83, 3, 21,
789, 26, 5, 22, NA, NA, NA
) # In the actual data, these go up to 30 potential Exam Code/Grade/Year combinations
## Attempt to distribute column names/values using names_sep as informed by other StackOverflow posts. I've also wrestled with names_pattern regex but can't get anything more useful than this.
sampleData_long_try1 <- sampleData %>%
pivot_longer(cols = -1,
names_to = c("ExamCode", "ExamGrade", "AdminYear"),
names_sep = "_")
## This approach uses pivot longer/wider in combination to get closer, but the output won't preserve duplicate IDs, instead returning list-values in columns, which is not helpful
sampleData_long_try2 <- sampleData %>%
pivot_longer(cols = -ID,
names_to = "type",
values_to = "value",
values_drop_na = T)
sampleData_final <- sampleData_long_try2 %>%
mutate(type = str_replace_all(type, "[:digit:]", "")) %>% # name repair so that it doesn't just pivot wider back into the original format
pivot_wider(names_from = "type",
values_from = "value") # can't figure out how to preserve unique ID/Exam Code combinations rather than consolidating the data into one row per ID with list-cols
## The desired final output needs to have each subject's exam code, score, and administration year on one row, as shown below.
desired_output <- tribble(
~ID, ~ExamCode, ~ExamGrade, ~AdminYear,
123, 4, 4, 22,
123, 26, 5, 22,
456, 26, 3, 22,
456, 83, 3, 21,
789, 26, 5, 22,
)
If this question is indeed a duplicate, please point me in the right direction. The closest answers I've found:
- Pivot_longer for multiple columns of repeated measurements data
- pivot_wider in R with multiple columns, but keep the name of former variables in separate columns
Thank you for any help you can provide!
CodePudding user response:
using pivot_longer
:
sampleData %>%
pivot_longer(-ID, names_to='.value', names_pattern='([A-Za-z] )', values_drop_na=TRUE)
# A tibble: 5 x 4
ID ExamCode ExamGrade AdminYear
<dbl> <dbl> <dbl> <dbl>
1 123 4 4 22
2 123 26 5 22
3 456 26 3 22
4 456 83 3 21
5 789 26 5 22
CodePudding user response:
We could do it this way:
tidyr::pivot_longer(sampleData,
cols = -ID,
names_to = c('.value', 'temp'),
names_sep = '_',
values_drop_na = TRUE) %>%
select(-temp)
ID ExamCode ExamGrade AdminYear
<dbl> <dbl> <dbl> <dbl>
1 123 4 4 22
2 123 26 5 22
3 456 26 3 22
4 456 83 3 21
5 789 26 5 22
CodePudding user response:
Your data has non-unique values for some ID
s, but unique values for ID-num(of exam)
, so I'd suggest putting that into its own column to make your rows distinct for pivoting. The exam sequencing presumably has some semantic meaning that could have relevance for your analysis.
sampleData %>%
pivot_longer(-1) %>%
separate(name, c("col", "num"), convert = TRUE) %>%
pivot_wider(names_from = col, values_from = value) %>%
drop_na()
# A tibble: 5 × 5
ID num ExamCode ExamGrade AdminYear
<dbl> <int> <dbl> <dbl> <dbl>
1 123 1 4 4 22
2 123 2 26 5 22
3 456 1 26 3 22
4 456 2 83 3 21
5 789 1 26 5 22