how to add multiple columns from one data frame to another based on values in another column?-CodePudding

I have two data frames and I want to take the last four columns from df1 and add them to df2 based on the fips code in df2. The fips code is labeled x2 in df1.

x1 =c(8000:8003)
mi =c(10:13)
x2 =c(5000:5003)
x21_40 = c(0,0,1,0)
x41_60 = c(1,0,0,0)
x61_80 = c(0,1,0,0)
x81_100 = c(0,0,0,1)
df1 = data.frame(x1,mi,x2, x21_40,x41_60,x61_80, x81_100)

fips = c(5000:5003)
county_name=c("a","b","c","d")
df2 = data.frame(fips, county_name)

> df1
    x1 mi   x2 x21_40 x41_60 x61_80 x81_100
1 8000 10 5000      0      1      0       0
2 8001 11 5001      0      0      1       0
3 8002 12 5002      1      0      0       0
4 8003 13 5003      0      0      0       1

> df2
  fips county_name
1 5000           a
2 5001           b
3 5002           c
4 5003           d

Desired

  fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

I don't want to merge both data frames, I only want the last four columns, so I don't think it's as simple as using left_join(). I have used %in% before, but that was only for filtering values in one data frame, based on values in another column from a different data frame. This one has four columns, so I'm not sure what to do.

CodePudding user response：

We can use left_join if we want to match the 'x2' from 'df1' and 'fips' from 'df2'

library(dplyr)
df2 <- left_join(df2, df1 %>%
                select(x2:last_col()), by = c("fips" = "x2"))

-output

df2
   fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

In case of duplicates in 'df1', get the max value for those columns grouped by 'fips/x2' and then do the join

df1 %>% 
  group_by(fips = x2) %>%
   summarise(across(x21_40:x81_100, max, na.rm = TRUE), 
      .groups = "drop") %>%
   left_join(df2, .)