I have two dataframes with the same column structure (just for simplification, as my dfs hold 40k observations).
df1 <- data.frame(name = c("mark", "peter", "lily"),
column1= c(1,2,3),
column2 = c(4,5,6))
df2 <- data.frame(name = c("mark", "liam", "peter"),
column1 = c(7,8,9),
column2 = c(1,2,3))
What do you think would be the most elegant way to apply a function row-wise to matching columns of two dataframes but only their matching rownames?
In this example: (df1$column1 df1$column2)/df2$column2
should only be applied to the respective rows "mark" and "peter" in both dfs.
CodePudding user response:
Maybe something like this...
library(dplyr)
df_combined<-df1%>%
left_join(df2, by="name") %>%
mutate(combined=(column1.x column2.x)/column2.y) %>%
filter(!is.na(combined)) %>%
select(name,combined)
df1 %>% left_join(df_combined)
#> name column1 column2 combined
#> 1 mark 1 4 5.000000
#> 2 peter 2 5 2.333333
#> 3 lily 3 6 NA
df2 %>% left_join(df_combined)
#> name column1 column2 combined
#> 1 mark 7 1 5.000000
#> 2 liam 8 2 NA
#> 3 peter 9 3 2.333333
Created on 2022-01-02 by the reprex package (v2.0.1)
CodePudding user response:
A base R solution. First merge
the data frames, then select the resulting columns with grep
.
df_new <- merge(df1, df2, "name")
df_new
name column1.x column2.x column1.y column2.y
1 mark 1 4 7 1
2 peter 2 5 9 3
df_new$calc <- apply(df_new[,-1], 1, function(a)
sum(a[grep("\\.x",names(a))]) / a[grep("2\\.y",names(a))])
df_new
name column1.x column2.x column1.y column2.y calc
1 mark 1 4 7 1 5.000000
2 peter 2 5 9 3 2.333333