Home > OS >  How to divide values of a column by each value of another column on a group-by-group basis
How to divide values of a column by each value of another column on a group-by-group basis

Time:06-25

I am trying to figure out how to do this in R but would really appreciate some input on this. Let's say I have two dataframes, A and B:

dataframe A

a <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
b <- c(1, 5, 10, 2, 3, 8, 10, 28, 36)
c <- c(runif(9, min=5, max=99))
df_A <- data.frame(a,b,c)
names(df_A) <- c('name', 'trial', 'counts')
   
   name trial   counts
1    A     1 42.18785
2    A     5 17.17859
3    A    10 29.34961
4    B     2 23.20101
5    B     3 58.57507
6    B     8 28.94360
7    C    10 25.48171
8    C    28 55.67896
9    C    36 10.04799

dataframe B

e <- c("A", "A", "A", "B", "C", "C")
f <- c(1, 5, 10, 2, 3, 28)
g <- c(runif(6, min=5, max=99))
df_B <- data.frame(e,f,g)
names(df_B) <- c('name', 'trial', 'rate')
   
    name trial   rate
1    A     1  8.408579
2    A     5 28.029798
3    A    10 18.904179
4    B     2 20.577880
5    C     3 44.492629
6    C    28 81.408402

As you can see, these two dataframes share two columns but differ in length. What I need to do is to divide each value in the counts column by each value of the rate column in dataframe B. This has to be done on a name-by-name basis (i.e., group_by name column). A correct dataframe after this will look like this:

   name trial   counts
1    A     1 42.18785 / 8.408579
2    A     1 42.18785 / 28.029798
3    A     1 42.18785 / 18.904179
4    A     5 17.17859 / 8.408579
5    A     5 17.17859 / 28.029798
6    A     5 17.17859 / 18.904179
7    A    10 29.34961 / 8.408579
8    A    10 29.34961 / 28.029798
9    A    10 29.34961 / 18.904179
10    B     2 23.20101 / 20.577880
11    B     3 58.57507 / 20.577880
12    B     8 28.94360 / 20.577880
13    C    10 25.48171 / 44.492629
14    C    10 25.48171 / 81.408402
15    C    28 55.67896 / 44.492629
16    C    36 10.04799 / 81.408402

CodePudding user response:

Here is a base R solution. merge the data sets and divide the result's columns counts by rate. Done with a pipe, introduced in R 4.2.0, to avoid the creation of a work, temporary data.frame.

merge(df_A, df_B[-2]) |>
  (\(x) cbind(x[1:2], counts = x[[3]]/x[[4]]))()
#>    name trial    counts
#> 1     A     1 4.9008255
#> 2     A     1 1.9812148
#> 3     A     1 0.8574978
#> 4     A     5 3.2969133
#> 5     A     5 1.3328149
#> 6     A     5 0.5768612
#> 7     A    10 0.6277524
#> 8     A    10 0.2537761
#> 9     A    10 0.1098379
#> 10    B     2 0.3528129
#> 11    B     3 4.0136321
#> 12    B     8 1.9712023
#> 13    C    10 9.7051006
#> 14    C    10 0.9257950
#> 15    C    28 2.9923193
#> 16    C    28 0.2854452
#> 17    C    36 2.2441296
#> 18    C    36 0.2140734

Created on 2022-06-21 by the reprex package (v2.0.1)

CodePudding user response:

A dplyr approach:


library(dplyr)

df_A |>    
left_join(df_B, by = "name") |>  
mutate(calc = counts / rate)
  • Related