Home > database >  how to add multiple columns from one data frame to another based on values in another column?
how to add multiple columns from one data frame to another based on values in another column?

Time:05-08

I have two data frames and I want to take the last four columns from df1 and add them to df2 based on the fips code in df2. The fips code is labeled x2 in df1.

x1 =c(8000:8003)
mi =c(10:13)
x2 =c(5000:5003)
x21_40 = c(0,0,1,0)
x41_60 = c(1,0,0,0)
x61_80 = c(0,1,0,0)
x81_100 = c(0,0,0,1)
df1 = data.frame(x1,mi,x2, x21_40,x41_60,x61_80, x81_100)

fips = c(5000:5003)
county_name=c("a","b","c","d")
df2 = data.frame(fips, county_name)

> df1
    x1 mi   x2 x21_40 x41_60 x61_80 x81_100
1 8000 10 5000      0      1      0       0
2 8001 11 5001      0      0      1       0
3 8002 12 5002      1      0      0       0
4 8003 13 5003      0      0      0       1

> df2
  fips county_name
1 5000           a
2 5001           b
3 5002           c
4 5003           d

Desired

  fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

I don't want to merge both data frames, I only want the last four columns, so I don't think it's as simple as using left_join(). I have used %in% before, but that was only for filtering values in one data frame, based on values in another column from a different data frame. This one has four columns, so I'm not sure what to do.

CodePudding user response:

We can use left_join if we want to match the 'x2' from 'df1' and 'fips' from 'df2'

library(dplyr)
df2 <- left_join(df2, df1 %>%
                select(x2:last_col()), by = c("fips" = "x2"))

-output

df2
   fips county_name x21_40 x41_60 x61_80 x81_100
1 5000           a      0      1      0       0
2 5001           b      0      0      1       0
3 5002           c      1      0      0       0
4 5003           d      0      0      0       1

In case of duplicates in 'df1', get the max value for those columns grouped by 'fips/x2' and then do the join

df1 %>% 
  group_by(fips = x2) %>%
   summarise(across(x21_40:x81_100, max, na.rm = TRUE), 
      .groups = "drop") %>%
   left_join(df2, .)
  • Related