Home > front end >  r - Pivot Longer distributing values into multiple columns
r - Pivot Longer distributing values into multiple columns

Time:08-06

There are quite a few questions already regarding the use of pivot longer and pivot wider to reshape data in the way that I need, but none quite get the answer right. I've successfully used a combination of pivot longer and pivot wider in the past to get the output that I need with differently shaped data, but the approach isn't quite working here.

library(tidyverse)
sampleData <- tribble(
  ~ID, ~ExamCode_01, ~ExamGrade_01, ~AdminYear_01, ~ExamCode_02, ~ExamGrade_02, ~AdminYear_02,
  123, 4, 4, 22, 26, 5, 22,
  456, 26, 3, 22, 83, 3, 21,
  789, 26, 5, 22, NA, NA, NA
) # In the actual data, these go up to 30 potential Exam Code/Grade/Year combinations

## Attempt to distribute column names/values using names_sep as informed by other StackOverflow posts. I've also wrestled with names_pattern regex but can't get anything more useful than this.
sampleData_long_try1 <- sampleData %>%
  pivot_longer(cols = -1,
               names_to = c("ExamCode", "ExamGrade", "AdminYear"),
               names_sep = "_")

## This approach uses pivot longer/wider in combination to get closer, but the output won't preserve duplicate IDs, instead returning list-values in columns, which is not helpful
sampleData_long_try2 <- sampleData %>%
  pivot_longer(cols = -ID,
               names_to = "type",
               values_to = "value",
               values_drop_na = T)

sampleData_final <- sampleData_long_try2 %>%
  mutate(type = str_replace_all(type, "[:digit:]", "")) %>% # name repair so that it doesn't just  pivot wider back into the original format
  pivot_wider(names_from = "type",
              values_from = "value") # can't figure out how to preserve unique ID/Exam Code combinations rather than consolidating the data into one row per ID with list-cols

## The desired final output needs to have each subject's exam code, score, and administration year on one row, as shown below.
desired_output <- tribble(
  ~ID, ~ExamCode, ~ExamGrade, ~AdminYear,
  123, 4, 4, 22,
  123, 26, 5, 22,
  456, 26, 3, 22,
  456, 83, 3, 21,
  789, 26, 5, 22,
)

If this question is indeed a duplicate, please point me in the right direction. The closest answers I've found:

Thank you for any help you can provide!

CodePudding user response:

using pivot_longer:

sampleData %>%
  pivot_longer(-ID, names_to='.value', names_pattern='([A-Za-z] )', values_drop_na=TRUE)

# A tibble: 5 x 4
     ID ExamCode ExamGrade AdminYear
  <dbl>    <dbl>     <dbl>     <dbl>
1   123        4         4        22
2   123       26         5        22
3   456       26         3        22
4   456       83         3        21
5   789       26         5        22

CodePudding user response:

We could do it this way:

tidyr::pivot_longer(sampleData, 
                    cols = -ID, 
                    names_to = c('.value', 'temp'), 
                    names_sep = '_', 
                    values_drop_na = TRUE) %>% 
  select(-temp)

     ID ExamCode ExamGrade AdminYear
  <dbl>    <dbl>     <dbl>     <dbl>
1   123        4         4        22
2   123       26         5        22
3   456       26         3        22
4   456       83         3        21
5   789       26         5        22

CodePudding user response:

Your data has non-unique values for some IDs, but unique values for ID-num(of exam), so I'd suggest putting that into its own column to make your rows distinct for pivoting. The exam sequencing presumably has some semantic meaning that could have relevance for your analysis.

sampleData %>%
  pivot_longer(-1) %>%
  separate(name, c("col", "num"), convert = TRUE) %>%
  pivot_wider(names_from = col, values_from = value) %>%
  drop_na()

# A tibble: 5 × 5
     ID   num ExamCode ExamGrade AdminYear
  <dbl> <int>    <dbl>     <dbl>     <dbl>
1   123     1        4         4        22
2   123     2       26         5        22
3   456     1       26         3        22
4   456     2       83         3        21
5   789     1       26         5        22
  • Related