I've looked quite a few different questions on here around Pivot Longer but I can't seem to figure out how to get my scenario to work. For example R Pivot multiple columns from wide to long
Here is a sample of my data
structure(list(co_number = c("C953543", "C988782", "C999839",
"C1000378", "C1004886", "C939949"), co_rejection_date_1 = structure(c(NA,
NA, NA, NA, 1645401600, 1637020800), tzone = "UTC", class = c("POSIXct",
"POSIXt")), co_rejection_category_1 = c(NA, NA, NA, NA, "CO Error",
"CO Error"), co_rejector_1 = c(NA, NA, NA, NA, "Quality Check",
"Quality Check"), co_rejection_rationale_1 = c(NA, NA, NA, NA,
"Add Tech Pubs approver.", "Updated approvers."), co_rejection_date_2 = structure(c(NA,
NA, NA, NA, NA, 1637280000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), co_rejection_category_2 = c(NA, NA, NA, NA, NA, "CO Error"
), co_rejector_2 = c(NA, NA, NA, NA, NA, "Labeling"), co_rejection_rationale_2 = c(NA,
NA, NA, NA, NA, "Need to correct CO number on drawing redlines."
), co_rejection_date_3 = structure(c(NA, NA, NA, NA, NA, 1638835200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), co_rejection_category_3 = c(NA,
NA, NA, NA, NA, "CO Error"), co_rejector_3 = c(NA, NA, NA, NA,
NA, "Labeling"), co_rejection_rationale_3 = c(NA, NA, NA, NA,
NA, "ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix attachment."
), co_rejection_date_4 = structure(c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct",
"POSIXt")), co_rejection_category_4 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
co_rejector_4 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), co_rejection_rationale_4 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), co_rejection_date_5 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct",
"POSIXt")), co_rejection_category_5 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), co_rejector_5 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), co_rejection_rationale_5 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), release_year = c("2022", "2022", "2022",
"2022", "2022", "2022")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I want it to look like:
co_number | co_rejection_number | co_rejection_date | co_rejection_category | co_rejection_rejector | co_rejection_rationale |
---|---|---|---|---|---|
C953543 | NA | NA | NA | NA | NA |
C988782 | NA | NA | NA | NA | NA |
C999839 | NA | NA | NA | NA | NA |
C1000378 | NA | NA | NA | NA | NA |
C1004886 | 1 | 2022-02-21 | CO Error | Quality Check | Add Tech Pubs approver. |
C939949 | 1 | 2021-11-16 | CO Error | Quality Check | Updated approvers. |
C939949 | 2 | 2021-11-19 | CO Error | Labeling | Need to correct CO number on drawing redlines. |
C939949 | 3 | 2021-12-07 | CO Error | Labeling | ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix attachment. |
CodePudding user response:
You need to use the "name_pattern" option to separate the column name from the rejection number
library(tidyr)
answer<-pivot_longer(df, -c(co_number, release_year),
names_pattern = "(. )_(\\d)",
names_to = c(".value", "co_rejection_number"),
values_drop_na = TRUE)
tail(answer, 10)
# A tibble: 4 × 7
co_number release_year co_rejection_number co_rejection_date co_rejection_category co_rejector co_rejection_rationale
<chr> <chr> <chr> <dttm> <chr> <chr> <chr>
1 C1004886 2022 1 2022-02-21 00:00:00 CO Error Quality Check Add Tech Pubs approver.
2 C939949 2022 1 2021-11-16 00:00:00 CO Error Quality Check Updated approvers.
3 C939949 2022 2 2021-11-19 00:00:00 CO Error Labeling Need to correct CO number on drawing redlines.
4 C939949 2022 3 2021-12-07 00:00:00 CO Error Labeling ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix att…
Now to add back in the co_numbers which are missing:
final <- full_join(answer, distinct(df[,c("co_number", "release_year")]))
final
co_number release_year co_rejection_number co_rejection_date co_rejection_category co_rejector co_rejection_rationale
<chr> <chr> <chr> <dttm> <chr> <chr> <chr>
1 C1004886 2022 1 2022-02-21 00:00:00 CO Error Quality Check Add Tech Pubs approver.
2 C939949 2022 1 2021-11-16 00:00:00 CO Error Quality Check Updated approvers.
3 C939949 2022 2 2021-11-19 00:00:00 CO Error Labeling Need to correct CO number on drawing redlines.
4 C939949 2022 3 2021-12-07 00:00:00 CO Error Labeling ZFIN 100170251: removed VI addendum config. Updated CA addendum config matrix att…
5 C953543 2022 NA NA NA NA NA
6 C988782 2022 NA NA NA NA NA
7 C999839 2022 NA NA NA NA NA
8 C1000378 2022 NA NA NA NA NA
CodePudding user response:
Another option using data.table, you can add 'na.rm=T' as an argument to get rid of NA.
library(data.table)
DT <- as.data.table(Test)
DT <- melt(DT, id.vars="co_number", variable.name="co_rejection_number",
measure = patterns("^co_rejection_date.*", "^co_rejection_category.*", "^co_rejector.*", "^co_rejection_rationale.*"),
value.name=c("co_rejection_date", "co_rejection_rejector", "co_rejection_category", "co_rejection_rationale"))