Home > database >  how to fill in empty data with data from another dataframe based on a column in R
how to fill in empty data with data from another dataframe based on a column in R

Time:03-11

I have two separate data frames and I need to use the 'B' data frame to fill in empty values in the 'A' data frame.

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))

#what I want

other site county stateprov country
v 1 ab 7 u
v 2 bc 8 u
v 3 de 9 c
v 4 eh 11 u
f 5 gi 12 u
f 6 fg 10 c

I'm not sure how fill in the NA's as join won't work in this case as far as I know. Thanks for your help

CodePudding user response:

dplyr::rows_update(A, B)

  other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c

CodePudding user response:

A simple loop should do it. Find the rows to replace, then loop through those rows and replace the A columns with B columns.

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))


rowToReplace <- which(A$county=="NA")

for(i in rowToReplace) {
    missingSite = A$site[i]
    A[i, c("county", "stateprov", "country")] <- B[which(B$site==missingSite), c("county", "stateprov", "country")]
}

Note, there is a difference between the string 'NA' and the NA value in R. The preference is to use NA instead of the string. NA has the advantage of working with the handy functions complete.cases() and is.na()

CodePudding user response:

A join will work in this case. The data.table package especially does this well where you can update A by reference (see ?data.table).

Using data.table

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))

library(data.table)

setDT(A)
setDT(B)

# If you assume B always contains replacement for A
# refer to columns in B as i.[column_name], specify join column in on
A[B, c('county', 'stateprov', 'country') := .(i.county, i.stateprov, i.country), on = .(site)]

# Assuming you need to check if A is na first
# refer to columns in B as i.[column_name], refer to columns in A as x.[column name], specify join column in on

A[B, c('county', 'stateprov', 'country') := .(  fifelse(is.na(x.county), i.county, x.county)
                                              , fifelse(is.na(x.stateprov), i.stateprov, x.stateprov)
                                              , fifelse(is.na(x.country), i.country, x.country)
                                              )
                                              , on = .(site)]

A

   other site county stateprov country
1:     v    1     ab         7       u
2:     v    2     bc         8       u
3:     v    3     de         9       c
4:     v    4     eh        11       u
5:     f    5     gi        12       u
6:     f    6     fg        10       c

Another way using dplyr

library(dplyr)
# This also checks whether column in A is NA first
left_join(A, B, by = c('site')) %>% 
  mutate(
    county = case_when(is.na(county.x) ~ county.y, TRUE ~ county.x)
    , stateprov = case_when(is.na(stateprov.x) ~ stateprov.y, TRUE ~ stateprov.x)
    , country = case_when(is.na(country.x) ~ country.y, TRUE ~ country.x)
    ) %>% 
  select(-ends_with('.x'), -ends_with('.y'))
  
other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c

CodePudding user response:

This is another approach

A = as.matrix(A)
B = as.matrix(merge(A[, c(1,2)], B, on="site",all.x=T)[,c(2,1,3,4,5)])
A[which(B!="NA")] <- B[which(B!="NA")]
data.frame(A) %>% dplyr::mutate(across(.cols = c(2,4),as.numeric))

Output:

  other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c
  •  Tags:  
  • r
  • Related