Home > Software design >  Replace NA values in a column with values from another df column if conditions are met
Replace NA values in a column with values from another df column if conditions are met

Time:12-07

I need to copy values from a column in one dataframe to a column in another if conditions are met.

Using the example below, if ID and BirthDate are the same in df1 and df2, AND Week2 is NA then copy the values from Week2a in df2.

ID <- c(1,2,3,4,5)
BirthDate <- c("2022-01-01", "2022-01-02", "2022-03-04", "2022-04-05", "2022-06-03")
week2 <- c("Y","Y","NA","NA","Y")

df1 <- data.frame(ID, BirthDate, week2)
df1
  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04    NA
4  4 2022-04-05    NA
5  5 2022-06-03     Y

ID <- c(1,2,3,4,5)
BirthDate <- c("2022-01-01", "2022-01-02", "2022-03-04", "2022-04-05", "2022-06-03")
week2a <- c("NA","NA","P","P","NA")
df2 <- data.frame(ID, BirthDate, week2a)
df2
  ID  BirthDate week2a
1  1 2022-01-01     NA
2  2 2022-01-02     NA
3  3 2022-03-04      P
4  4 2022-04-05      P
5  5 2022-06-03     NA

The desired output I would get is

  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y

Any help is greatly appreciated!!

CodePudding user response:

With your data you can do something like this:

df2[df2$week2a=="NA",]$week2a<-df1$week2[df1$week2!="NA"]

# > df2
#
#   ID  BirthDate week2a
#   1  1 2022-01-01      Y
#   2  2 2022-01-02      Y
#   3  3 2022-03-04      P
#   4  4 2022-04-05      P
#   5  5 2022-06-03      Y

If your data points are NA values then you can use is.na instead of the logical expressions


df2[is.na(df2$week2a),]$week2a<-df1$week2[!is.na(df1$week2)]

CodePudding user response:

First join by ID, then replace character 'NA' with real NA (you could omit this line, if NA is not just character NA). Then use coalesce:

library(dplyr)

left_join(df1, df2) %>% 
  mutate(across(starts_with("week"), ~na_if(., "NA"))) %>% 
  mutate(week2 = coalesce(week2, week2a), .keep="unused")
    ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y

CodePudding user response:

Using dplyr::rows_patch():

library(dplyr)

df1 %>%
  rows_patch(
    rename(df2, week2 = week2a),
    by = c("ID", "BirthDate")
  )
  ID  BirthDate week2
1  1 2022-01-01     Y
2  2 2022-01-02     Y
3  3 2022-03-04     P
4  4 2022-04-05     P
5  5 2022-06-03     Y

NB, I assumed your dataset has real NAs rather than character "NA"s like your example; if not you’d first have to convert them (e.g., using tidyr::na_if(week2, "NA")).

  • Related