I have 2 data frames,
- dfA is a large one including all cities and all health data 2020-2021 except for health data of city c in 2021 which is marked as “-3”.
dfA
- City Name Pulse20 Pulse21
- CityA Amy 77 78
- CityB Bob 80 79
- CityC Cathy 79 -3
- dfB is a small one including the data I want to fill in dfA
dfB
- City Name Pulse21
- CityC Cathy 80
Request: 1.how to combine this two dataframe in a common use way?
2.if I use “full_join”,the Pulse21 would be listed into “Pulse21.x””Pulse21.y”,therefore I need to do more bind jobs
3.For the record,in my real data, each city has more than 500 people, and the health data would be like 100 and more.
- So is there any other things I can to make it more simple and efficient? thanks a lot!
CodePudding user response:
dplyr::rows_update(dfA, dfB, c('City', 'Name'))
City Name Pulse20 Pulse21
1 CityA Amy 77 78
2 CityB Bob 80 79
3 CityC Cathy 79 80
CodePudding user response:
I would still use full_join()
, but with only one suffix, and then use mutate()
to fix the problem.
Here is an example:
library(dplyr)
# Join with full_join, but only add suffix to one column
df_joined <-
dfA %>%
full_join(dfB, by = c("City", "Name"), suffix = c("", "_replacement"))
df_joined
#> # A tibble: 3 × 5
#> City Name Pulse20 Pulse21 Pulse21_replacement
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 CityA Amy 77 78 NA
#> 2 CityB Bob 80 79 NA
#> 3 CityC Cathy 79 -3 80
# Fix issue with CityC
df_joined %>%
mutate(
Pulse21 = ifelse(is.na(Pulse21_replacement), Pulse21, Pulse21_replacement)
) %>%
select(-Pulse21_replacement)
#> # A tibble: 3 × 4
#> City Name Pulse20 Pulse21
#> <chr> <chr> <dbl> <dbl>
#> 1 CityA Amy 77 78
#> 2 CityB Bob 80 79
#> 3 CityC Cathy 79 80
Created on 2022-04-17 by the reprex package (v2.0.1)
Some other options
Other possible solutions include bind_rows()
together with summarize()
, or dplyr::rows_update()
as suggested by @KU99 - but note that this function is experimental
and might change with future updates of dplyr
.
Data
dfA <- tibble::tribble(
~City, ~Name, ~Pulse20, ~Pulse21,
"CityA", "Amy", 77, 78,
"CityB", "Bob", 80, 79,
"CityC", "Cathy", 79, -3
)
dfB <- tibble::tribble(
~City, ~Name, ~Pulse21,
"CityC", "Cathy", 80
)