Home > Blockchain >  R Multiple Dataframe Column Matches to Populate Column
R Multiple Dataframe Column Matches to Populate Column

Time:10-01

I have a dataframe 'df1' that looks like:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-"
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA, 
NA, NA), Long = c(NA, NA, NA, NA, NA)), row.names = c(NA, 5L), class = "data.frame")

And a second dataframe 'df2' that looks like:

structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA", 
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA", 
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833, 
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778, 
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444, 
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277, 
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889, 
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222, 
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA, 
-19L))

How can I make populate each row of 'Lat' and 'Long' of df1 from 'Lat' and 'Long' of df2 when 'Contact', 'Location', and 'MAPS_code' of the corresponding rows match between df1 and df2? So that the result of df1 looks like:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-"
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306")), row.names = c(NA, 5L), class = "data.frame")

CodePudding user response:

You can do this:

library(dplyr)

df1 %>%
  rename(Location = Location_code) %>%
  left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
  rename_with(~ gsub('.y', '', .), ends_with('.y')) %>%
  select(!ends_with('.x'))

  MAPS_code Location          Contact      Lat      Long
1      SARI     LCP- Chase Mendenhall 8.827778 -82.92417
2      SABO     LCP- Chase Mendenhall 8.801111 -82.91722
3      SABO     LCP- Chase Mendenhall 8.801111 -82.91722
4      SABO     LCP- Chase Mendenhall 8.801111 -82.91722

CodePudding user response:

This is very similar to @Anoushiravan R's answer, but I'd suggest removing unnecessary variables before joining, rather than wrestling with the variable names generated by duplicate columns:

df1 %>%
    select(-Lat, -Long) %>%
    left_join(df2, by=c('MAPS_code', 'Location_code'='Location', 'Contact'))


  MAPS_code Location_code          Contact      Lat      Long
1      SARI          LCP- Chase Mendenhall 8.827778 -82.92417
2      SABO          LCP- Chase Mendenhall 8.801111 -82.91722
3      SABO          LCP- Chase Mendenhall 8.801111 -82.91722
4      SABO          LCP- Chase Mendenhall 8.801111 -82.91722
5      ISLA          LCP- Chase Mendenhall 8.835833 -82.96306

Passing a named vector element to the by= argument (for example 'Location_code'='Location') lets you join on non-matching variable names. The name on the left side of the = is the name in the first data.frame (or the one piped in), while the name on the right side is the name in the second data.frame.

Also, note that this will add the new variables onto the end (right) of the joined data frame. If you wanted to match the order of variables in df1 (that wouldn't apply here, since the order is the same), you can use select to reorder the variables:

... %>%
    select(one_of(names(df1)), everything())

names(df1) gets the variable names from df1 in order. By putting it in one_of, a variable missing in the final data.frame will give a warning instead of an error. Ending with everything, will ensure that no variables are lost, and any variable not in df1 will be stuck on the end.

  • Related