I am trying to figure out how to do this in R but would really appreciate some input on this. Let's say I have two dataframes, A and B:
dataframe A
a <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
b <- c(1, 5, 10, 2, 3, 8, 10, 28, 36)
c <- c(runif(9, min=5, max=99))
df_A <- data.frame(a,b,c)
names(df_A) <- c('name', 'trial', 'counts')
name trial counts
1 A 1 42.18785
2 A 5 17.17859
3 A 10 29.34961
4 B 2 23.20101
5 B 3 58.57507
6 B 8 28.94360
7 C 10 25.48171
8 C 28 55.67896
9 C 36 10.04799
dataframe B
e <- c("A", "A", "A", "B", "C", "C")
f <- c(1, 5, 10, 2, 3, 28)
g <- c(runif(6, min=5, max=99))
df_B <- data.frame(e,f,g)
names(df_B) <- c('name', 'trial', 'rate')
name trial rate
1 A 1 8.408579
2 A 5 28.029798
3 A 10 18.904179
4 B 2 20.577880
5 C 3 44.492629
6 C 28 81.408402
As you can see, these two dataframes share two columns but differ in length. What I need to do is to divide each value in the counts
column by each value of the rate
column in dataframe B
. This has to be done on a name-by-name basis (i.e., group_by
name
column). A correct dataframe after this will look like this:
name trial counts
1 A 1 42.18785 / 8.408579
2 A 1 42.18785 / 28.029798
3 A 1 42.18785 / 18.904179
4 A 5 17.17859 / 8.408579
5 A 5 17.17859 / 28.029798
6 A 5 17.17859 / 18.904179
7 A 10 29.34961 / 8.408579
8 A 10 29.34961 / 28.029798
9 A 10 29.34961 / 18.904179
10 B 2 23.20101 / 20.577880
11 B 3 58.57507 / 20.577880
12 B 8 28.94360 / 20.577880
13 C 10 25.48171 / 44.492629
14 C 10 25.48171 / 81.408402
15 C 28 55.67896 / 44.492629
16 C 36 10.04799 / 81.408402
CodePudding user response:
Here is a base R solution. merge
the data sets and divide the result's columns counts
by rate
. Done with a pipe, introduced in R 4.2.0, to avoid the creation of a work, temporary data.frame.
merge(df_A, df_B[-2]) |>
(\(x) cbind(x[1:2], counts = x[[3]]/x[[4]]))()
#> name trial counts
#> 1 A 1 4.9008255
#> 2 A 1 1.9812148
#> 3 A 1 0.8574978
#> 4 A 5 3.2969133
#> 5 A 5 1.3328149
#> 6 A 5 0.5768612
#> 7 A 10 0.6277524
#> 8 A 10 0.2537761
#> 9 A 10 0.1098379
#> 10 B 2 0.3528129
#> 11 B 3 4.0136321
#> 12 B 8 1.9712023
#> 13 C 10 9.7051006
#> 14 C 10 0.9257950
#> 15 C 28 2.9923193
#> 16 C 28 0.2854452
#> 17 C 36 2.2441296
#> 18 C 36 0.2140734
Created on 2022-06-21 by the reprex package (v2.0.1)
CodePudding user response:
A dplyr approach:
library(dplyr)
df_A |>
left_join(df_B, by = "name") |>
mutate(calc = counts / rate)