I have two data frames and I want to take the last four columns from df1 and add them to df2 based on the fips code in df2. The fips code is labeled x2 in df1.
x1 =c(8000:8003)
mi =c(10:13)
x2 =c(5000:5003)
x21_40 = c(0,0,1,0)
x41_60 = c(1,0,0,0)
x61_80 = c(0,1,0,0)
x81_100 = c(0,0,0,1)
df1 = data.frame(x1,mi,x2, x21_40,x41_60,x61_80, x81_100)
fips = c(5000:5003)
county_name=c("a","b","c","d")
df2 = data.frame(fips, county_name)
> df1
x1 mi x2 x21_40 x41_60 x61_80 x81_100
1 8000 10 5000 0 1 0 0
2 8001 11 5001 0 0 1 0
3 8002 12 5002 1 0 0 0
4 8003 13 5003 0 0 0 1
> df2
fips county_name
1 5000 a
2 5001 b
3 5002 c
4 5003 d
Desired
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1
I don't want to merge both data frames, I only want the last four columns, so I don't think it's as simple as using left_join()
. I have used %in%
before, but that was only for filtering values in one data frame, based on values in another column from a different data frame. This one has four columns, so I'm not sure what to do.
CodePudding user response:
We can use left_join
if we want to match the 'x2' from 'df1' and 'fips' from 'df2'
library(dplyr)
df2 <- left_join(df2, df1 %>%
select(x2:last_col()), by = c("fips" = "x2"))
-output
df2
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1
In case of duplicates in 'df1', get the max
value for those columns grouped by 'fips/x2' and then do the join
df1 %>%
group_by(fips = x2) %>%
summarise(across(x21_40:x81_100, max, na.rm = TRUE),
.groups = "drop") %>%
left_join(df2, .)