Home > Mobile >  Join of column values for specific row values
Join of column values for specific row values

Time:09-25

I'd like to join (left_join) a tibble (df2) to another one (df1) only where the value of col2 in df1 is NA. I am currently using a code that is not very elegant. Any advice on how to shorten the code would be greatly appreciated!

library(tidyverse)

# df1 contains NAs that need to be replaced by values from df2, for relevant col1 values
df1 <- tibble(col1 = c("a", "b", "c", "d"), col2 = c(1, 2, NA, NA), col3 = c(10, 20, 30, 40))
df2 <- tibble(col1 = c("a", "b", "c", "d"), col2 = c(5, 6, 7, 8), col3 = c(50, 60, 70, 80))

# my current approach
df3 <- df1 %>%
  filter(!is.na(col2))

df4 <- df1 %>%
  filter(is.na(col2)) %>%
  select(col1)%>%
  left_join(df2)

# output tibble that is expected
df_final <- df3 %>%
  bind_rows(df4)

CodePudding user response:

We can use data.table methods

library(data.table)
setDT(df1)[setDT(df2), col2 := fcoalesce(col2, i.col2), on = .(col1)]

-output

> df1
   col1 col2 col3
1:    a    1   10
2:    b    2   20
3:    c    7   30
4:    d    8   40

Or an option with tidyverse

library(dplyr)
library(stringr)
df1 %>% 
   left_join(df2, by = c("col1")) %>% 
    transmute(col1, across(ends_with(".x"),
      ~ coalesce(., get(str_replace(cur_column(), ".x", ".y"))), 
           .names = "{str_remove(.col, '.x')}"))

-output

# A tibble: 4 x 3
    col1  col2  col3
  <chr>  <dbl>  <dbl>
1 a          1     10
2 b          2     20
3 c          7     30
4 d          8     40

CodePudding user response:

Here's a small dplyr answer that works for me, although it might get slow if you have tons of rows:

df1 %>%
  filter(is.na(col2)) %>%
  select(col1) %>%
  left_join(df2, by = "col1") %>%
  bind_rows(df1, .) %>%
  filter(!is.na(col2))
  • Related