Home > OS >  R Overwrite column values with non NA values from column in separate dataframe
R Overwrite column values with non NA values from column in separate dataframe

Time:11-29

I have a dataframe 'df1' with a lot of columns, but the ones of interest are:

Number Code
1
2
3
10
11 AMRO
4
277
2100 BLPH

And I have another dataframe 'df2' with a lot of columns, but the ones of interest are:

Number Code
1 AMCR
2 AMCR
3 BANO
10 BAEA
12 AMRO
4 NA
277 NA
2100 NA

I want matching values in the 'Number' columns of 'df1' and 'df2' to lead to values in the 'Code' column in 'df2' to overwrite the 'Code' values in 'df1' as long as the 'Code' values in 'df2' don't contain an NA, so that the final result of 'df1' looks like:

Number Code
1 AMCR
2 AMCR
3 BANO
10 BAEA
11 AMRO
4
277
2100 BLPH

Thank you for your help!

CodePudding user response:

We can do

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce)

-output

Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

Or to do an overwrite, use data.table

library(data.table)
setDT(df1)[df2, Code := fcoalesce(Code, i.Code), on = .(Number)]

-output

> df1
   Number   Code
    <int> <char>
1:      1   AMCR
2:      2   AMCR
3:      3   BANO
4:     10   BAEA
5:     11   AMRO
6:      4   <NA>
7:    277   <NA>
8:   2100   BLPH

data

df1 <- structure(list(Number = c(1L, 2L, 3L, 10L, 11L, 4L, 277L, 2100L
), Code = c(NA, NA, NA, NA, "AMRO", NA, NA, "BLPH")), 
class = "data.frame", row.names = c(NA, 
-8L))

df2 <- structure(list(Number = c(1L, 2L, 3L, 10L, 12L, 4L, 277L, 2100L
), Code = c("AMCR", "AMCR", "BANO", "BAEA", "AMRO", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

CodePudding user response:

Here is an alternative approach using bind_cols:

library(dplyr)

bind_cols(df1, df2) %>% 
  mutate(Code = coalesce(Code...2, Code...4)) %>% 
  select(Number = Number...1, Code)

 Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

CodePudding user response:

Here is a solution playing with dplyr full_join and inner_join

library(dplyr)
df1 %>% 
  full_join(df2) %>% na.omit() %>% 
  full_join(df1 %>% inner_join(df2)) %>% 
  filter(Number %in% df1$Number) %>%
  arrange(Number)

Output


#>   Number Code
#> 1      1 AMCR
#> 2      2 AMCR
#> 3      3 BANO
#> 4      4 <NA>
#> 5     10 BAEA
#> 6     11 AMRO
#> 7    277 <NA>
#> 8   2100 BLPH
  • Related