I have a dataframe 'df1' that looks like:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-"
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA,
NA, NA), Long = c(NA, NA, NA, NA, NA)), row.names = c(NA, 5L), class = "data.frame")
And a second dataframe 'df2' that looks like:
structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA",
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA",
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833,
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778,
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444,
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277,
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889,
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222,
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA,
-19L))
How can I make populate each row of 'Lat' and 'Long' of df1 from 'Lat' and 'Long' of df2 when 'Contact', 'Location', and 'MAPS_code' of the corresponding rows match between df1 and df2? So that the result of df1 looks like:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-"
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306")), row.names = c(NA, 5L), class = "data.frame")
CodePudding user response:
You can do this:
library(dplyr)
df1 %>%
rename(Location = Location_code) %>%
left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
rename_with(~ gsub('.y', '', .), ends_with('.y')) %>%
select(!ends_with('.x'))
MAPS_code Location Contact Lat Long
1 SARI LCP- Chase Mendenhall 8.827778 -82.92417
2 SABO LCP- Chase Mendenhall 8.801111 -82.91722
3 SABO LCP- Chase Mendenhall 8.801111 -82.91722
4 SABO LCP- Chase Mendenhall 8.801111 -82.91722
CodePudding user response:
This is very similar to @Anoushiravan R's answer, but I'd suggest removing unnecessary variables before joining, rather than wrestling with the variable names generated by duplicate columns:
df1 %>%
select(-Lat, -Long) %>%
left_join(df2, by=c('MAPS_code', 'Location_code'='Location', 'Contact'))
MAPS_code Location_code Contact Lat Long
1 SARI LCP- Chase Mendenhall 8.827778 -82.92417
2 SABO LCP- Chase Mendenhall 8.801111 -82.91722
3 SABO LCP- Chase Mendenhall 8.801111 -82.91722
4 SABO LCP- Chase Mendenhall 8.801111 -82.91722
5 ISLA LCP- Chase Mendenhall 8.835833 -82.96306
Passing a named vector element to the by=
argument (for example 'Location_code'='Location'
) lets you join on non-matching variable names. The name on the left side of the =
is the name in the first data.frame (or the one piped in), while the name on the right side is the name in the second data.frame.
Also, note that this will add the new variables onto the end (right) of the joined data frame. If you wanted to match the order of variables in df1
(that wouldn't apply here, since the order is the same), you can use select
to reorder the variables:
... %>%
select(one_of(names(df1)), everything())
names(df1)
gets the variable names from df1 in order. By putting it in one_of
, a variable missing in the final data.frame will give a warning instead of an error. Ending with everything
, will ensure that no variables are lost, and any variable not in df1 will be stuck on the end.