I have the tables below:
1st Table:
ID | Rejected | New | Expired |
---|---|---|---|
001 | 2021-02-21 | 2022-03-20 | 2021-05-20 |
001 | 2021-02-21 | 2022-03-20 | 2021-05-20 |
002 | 2021-03-21 | NA | 2021-05-20 |
002 | 2021-03-21 | NA | 2021-05-20 |
003 | 2021-05-20 | NA | 2021-05-20 |
003 | 2021-05-20 | NA | 2021-05-20 |
004 | 2021-05-20 | 2021-11-03 | 2022-06-20 |
004 | 2021-05-20 | 2021-11-03 | 2022-06-20 |
005 | 2021-05-20 | 2021-11-03 | 2022-06-20 |
005 | 2021-05-20 | 2021-11-03 | 2022-06-20 |
2nd Table:
ID | date |
---|---|
001 | 2021-04-30 |
002 | 2021-04-30 |
003 | 2021-04-30 |
004 | 2021-04-30 |
005 | 2021-04-30 |
Desired Output:
ID | Rejected | New | Expired |
---|---|---|---|
001 | 2021-02-21 | 2021-02-21 | 2021-05-20 |
001 | 2021-02-21 | 2021-02-21 | 2021-05-20 |
002 | 2021-06-21 | 2021-04-30 | 2021-06-21 |
002 | 2021-06-21 | 2021-04-30 | 2021-06-21 |
003 | 2021-05-20 | 2021-04-30 | 2021-05-20 |
003 | 2021-05-20 | 2021-04-30 | 2021-05-20 |
004 | 2021-03-20 | 2021-03-20 | 2022-06-20 |
004 | 2021-03-20 | 2021-03-20 | 2022-06-20 |
005 | 2021-05-20 | 2021-05-20 | 2022-06-20 |
005 | 2021-05-20 | 2021-05-20 | 2022-06-20 |
What I want:
- Merge table 1 and 2 by ID only for values where table1$new is not NA. (I.e. all NA values in table1 should be filled with date values from table2)
- After merging, merge$new dates cannot occur after rejected or expired. One solution could be finding the minimum value in each row and placing that in New.
My Code:
table2 <- q1 %>% ##Create new dataset min_val from q1##
group_by(ID) %>%
slice(which.min(date)) %>% ##find min value from each row##
rename(New2 = date) ## rename the createdatetime to New2##
merged <- table1 %>% #merge merged_final.1 to min_val##
left_join(table2, by = 'ID') %>%
mutate(New = coalesce(New, New2)) %>% ##This will make sure only NA value are replaced
select(-New2) ##drop New2 column##
merged$New <- as.Date(apply(merged[, c(2, 3, 4)], 1, FUN = min))
Issue
This last line of code does not seem to be working for me. When I run this, many of the merged$new values turn to NA and the previously NA rows of merge$rejected and merge$expired are suddenly filled with random dates.
Any help would be appreciated. Also not sure why my third table isn't showing up in html format.
CodePudding user response:
You can use left_join
then coalesce
the two columns
library(tidyverse)
left_join(df1, df2) %>%
mutate(New = pmin(Rejected, coalesce(New, date), Expired), date = NULL)
ID Rejected New Expired
1 1 2021-02-21 2021-02-21 2021-05-20
2 1 2021-02-21 2021-02-21 2021-05-20
3 2 2021-03-21 2021-03-21 2021-05-20
4 2 2021-03-21 2021-03-21 2021-05-20
5 3 2021-05-20 2021-04-30 2021-05-20
6 3 2021-05-20 2021-04-30 2021-05-20
7 4 2021-05-20 2021-05-20 2022-06-20
8 4 2021-05-20 2021-05-20 2022-06-20
9 5 2021-05-20 2021-05-20 2022-06-20
10 5 2021-05-20 2021-05-20 2022-06-20
CodePudding user response:
We could do it with an ifelse
statement.
By the way the most valuable thing with this question is the part of @onyambu's answer using date=NULL
. I wished always to find such an argument to avoid the last select(-xyz)
. Thank you for this:
library(dplyr)
left_join(df1, df2) %>%
mutate(New= ifelse(is.na(New), min(date), New), date=NULL)
ID Rejected New Expired
1 1 2021-02-21 2022-03-20 2021-05-20
2 1 2021-02-21 2022-03-20 2021-05-20
3 2 2021-03-21 2021-04-30 2021-05-20
4 2 2021-03-21 2021-04-30 2021-05-20
5 3 2021-05-20 2021-04-30 2021-05-20
6 3 2021-05-20 2021-04-30 2021-05-20
7 4 2021-05-20 2021-11-03 2022-06-20
8 4 2021-05-20 2021-11-03 2022-06-20
9 5 2021-05-20 2021-11-03 2022-06-20
10 5 2021-05-20 2021-11-03 2022-06-20