Home > Net >  Finding min value for date across rows in R
Finding min value for date across rows in R

Time:06-15

I have the tables below:

1st Table:

ID Rejected New Expired
001 2021-02-21 2022-03-20 2021-05-20
001 2021-02-21 2022-03-20 2021-05-20
002 2021-03-21 NA 2021-05-20
002 2021-03-21 NA 2021-05-20
003 2021-05-20 NA 2021-05-20
003 2021-05-20 NA 2021-05-20
004 2021-05-20 2021-11-03 2022-06-20
004 2021-05-20 2021-11-03 2022-06-20
005 2021-05-20 2021-11-03 2022-06-20
005 2021-05-20 2021-11-03 2022-06-20

2nd Table:

ID date
001 2021-04-30
002 2021-04-30
003 2021-04-30
004 2021-04-30
005 2021-04-30

Desired Output:

ID Rejected New Expired
001 2021-02-21 2021-02-21 2021-05-20
001 2021-02-21 2021-02-21 2021-05-20
002 2021-06-21 2021-04-30 2021-06-21
002 2021-06-21 2021-04-30 2021-06-21
003 2021-05-20 2021-04-30 2021-05-20
003 2021-05-20 2021-04-30 2021-05-20
004 2021-03-20 2021-03-20 2022-06-20
004 2021-03-20 2021-03-20 2022-06-20
005 2021-05-20 2021-05-20 2022-06-20
005 2021-05-20 2021-05-20 2022-06-20

What I want:

  1. Merge table 1 and 2 by ID only for values where table1$new is not NA. (I.e. all NA values in table1 should be filled with date values from table2)
  2. After merging, merge$new dates cannot occur after rejected or expired. One solution could be finding the minimum value in each row and placing that in New.

My Code:

table2 <- q1 %>% ##Create new dataset min_val from q1##
  group_by(ID) %>%
  slice(which.min(date)) %>% ##find min value from each row##
  rename(New2 = date)   ## rename the createdatetime to New2##

merged <- table1 %>% #merge merged_final.1 to min_val##
  left_join(table2, by = 'ID') %>% 
  mutate(New = coalesce(New, New2)) %>% ##This will make sure only NA value are replaced
  select(-New2) ##drop New2 column##

merged$New <- as.Date(apply(merged[, c(2, 3, 4)], 1, FUN = min))

Issue

This last line of code does not seem to be working for me. When I run this, many of the merged$new values turn to NA and the previously NA rows of merge$rejected and merge$expired are suddenly filled with random dates.

Any help would be appreciated. Also not sure why my third table isn't showing up in html format.

CodePudding user response:

You can use left_join then coalesce the two columns

library(tidyverse)

left_join(df1, df2) %>%
    mutate(New = pmin(Rejected, coalesce(New, date), Expired), date = NULL)

   ID   Rejected        New    Expired
1   1 2021-02-21 2021-02-21 2021-05-20
2   1 2021-02-21 2021-02-21 2021-05-20
3   2 2021-03-21 2021-03-21 2021-05-20
4   2 2021-03-21 2021-03-21 2021-05-20
5   3 2021-05-20 2021-04-30 2021-05-20
6   3 2021-05-20 2021-04-30 2021-05-20
7   4 2021-05-20 2021-05-20 2022-06-20
8   4 2021-05-20 2021-05-20 2022-06-20
9   5 2021-05-20 2021-05-20 2022-06-20
10  5 2021-05-20 2021-05-20 2022-06-20

CodePudding user response:

We could do it with an ifelse statement.

By the way the most valuable thing with this question is the part of @onyambu's answer using date=NULL. I wished always to find such an argument to avoid the last select(-xyz). Thank you for this:

library(dplyr)

left_join(df1, df2) %>% 
  mutate(New= ifelse(is.na(New), min(date), New), date=NULL)
   ID   Rejected        New    Expired
1   1 2021-02-21 2022-03-20 2021-05-20
2   1 2021-02-21 2022-03-20 2021-05-20
3   2 2021-03-21 2021-04-30 2021-05-20
4   2 2021-03-21 2021-04-30 2021-05-20
5   3 2021-05-20 2021-04-30 2021-05-20
6   3 2021-05-20 2021-04-30 2021-05-20
7   4 2021-05-20 2021-11-03 2022-06-20
8   4 2021-05-20 2021-11-03 2022-06-20
9   5 2021-05-20 2021-11-03 2022-06-20
10  5 2021-05-20 2021-11-03 2022-06-20
  • Related