Home > Enterprise >  Pivot Longer in R Issues
Pivot Longer in R Issues

Time:05-14

I've looked quite a few different questions on here around Pivot Longer but I can't seem to figure out how to get my scenario to work. For example R Pivot multiple columns from wide to long

Here is a sample of my data

structure(list(co_number = c("C953543", "C988782", "C999839", 
"C1000378", "C1004886", "C939949"), co_rejection_date_1 = structure(c(NA, 
NA, NA, NA, 1645401600, 1637020800), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), co_rejection_category_1 = c(NA, NA, NA, NA, "CO Error", 
"CO Error"), co_rejector_1 = c(NA, NA, NA, NA, "Quality Check", 
"Quality Check"), co_rejection_rationale_1 = c(NA, NA, NA, NA, 
"Add Tech Pubs approver.", "Updated approvers."), co_rejection_date_2 = structure(c(NA, 
NA, NA, NA, NA, 1637280000), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), co_rejection_category_2 = c(NA, NA, NA, NA, NA, "CO Error"
), co_rejector_2 = c(NA, NA, NA, NA, NA, "Labeling"), co_rejection_rationale_2 = c(NA, 
NA, NA, NA, NA, "Need to correct CO number on drawing redlines."
), co_rejection_date_3 = structure(c(NA, NA, NA, NA, NA, 1638835200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), co_rejection_category_3 = c(NA, 
NA, NA, NA, NA, "CO Error"), co_rejector_3 = c(NA, NA, NA, NA, 
NA, "Labeling"), co_rejection_rationale_3 = c(NA, NA, NA, NA, 
NA, "ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix attachment."
), co_rejection_date_4 = structure(c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), co_rejection_category_4 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    co_rejector_4 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), co_rejection_rationale_4 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), co_rejection_date_5 = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), co_rejection_category_5 = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), co_rejector_5 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), co_rejection_rationale_5 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), release_year = c("2022", "2022", "2022", 
    "2022", "2022", "2022")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

I want it to look like:

co_number co_rejection_number co_rejection_date co_rejection_category co_rejection_rejector co_rejection_rationale
C953543 NA NA NA NA NA
C988782 NA NA NA NA NA
C999839 NA NA NA NA NA
C1000378 NA NA NA NA NA
C1004886 1 2022-02-21 CO Error Quality Check Add Tech Pubs approver.
C939949 1 2021-11-16 CO Error Quality Check Updated approvers.
C939949 2 2021-11-19 CO Error Labeling Need to correct CO number on drawing redlines.
C939949 3 2021-12-07 CO Error Labeling ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix attachment.

CodePudding user response:

You need to use the "name_pattern" option to separate the column name from the rejection number

library(tidyr)

answer<-pivot_longer(df, -c(co_number, release_year), 
                     names_pattern = "(. )_(\\d)", 
                     names_to = c(".value", "co_rejection_number"), 
                     values_drop_na = TRUE)

tail(answer, 10)
# A tibble: 4 × 7
  co_number release_year co_rejection_number co_rejection_date   co_rejection_category co_rejector   co_rejection_rationale                                                            
  <chr>     <chr>        <chr>               <dttm>              <chr>                 <chr>         <chr>                                                                             
1 C1004886  2022         1                   2022-02-21 00:00:00 CO Error              Quality Check Add Tech Pubs approver.                                                           
2 C939949   2022         1                   2021-11-16 00:00:00 CO Error              Quality Check Updated approvers.                                                                
3 C939949   2022         2                   2021-11-19 00:00:00 CO Error              Labeling      Need to correct CO number on drawing redlines.                                    
4 C939949   2022         3                   2021-12-07 00:00:00 CO Error              Labeling      ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix att…

Now to add back in the co_numbers which are missing:

final <- full_join(answer, distinct(df[,c("co_number", "release_year")]))
final

 co_number release_year co_rejection_number co_rejection_date   co_rejection_category co_rejector   co_rejection_rationale                                                            
  <chr>     <chr>        <chr>               <dttm>              <chr>                 <chr>         <chr>                                                                             
1 C1004886  2022         1                   2022-02-21 00:00:00 CO Error              Quality Check Add Tech Pubs approver.                                                           
2 C939949   2022         1                   2021-11-16 00:00:00 CO Error              Quality Check Updated approvers.                                                                
3 C939949   2022         2                   2021-11-19 00:00:00 CO Error              Labeling      Need to correct CO number on drawing redlines.                                    
4 C939949   2022         3                   2021-12-07 00:00:00 CO Error              Labeling      ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix att…
5 C953543   2022         NA                  NA                  NA                    NA            NA                                                                                
6 C988782   2022         NA                  NA                  NA                    NA            NA                                                                                
7 C999839   2022         NA                  NA                  NA                    NA            NA                                                                                
8 C1000378  2022         NA                  NA                  NA                    NA            NA                                                                                

     

CodePudding user response:

Another option using data.table, you can add 'na.rm=T' as an argument to get rid of NA.

library(data.table)
DT <- as.data.table(Test)
DT <- melt(DT, id.vars="co_number", variable.name="co_rejection_number", 
          measure = patterns("^co_rejection_date.*", "^co_rejection_category.*", "^co_rejector.*", "^co_rejection_rationale.*"), 
          value.name=c("co_rejection_date", "co_rejection_rejector", "co_rejection_category", "co_rejection_rationale"))
  • Related