Home > Back-end >  Multiple calculations on a dataframe based on certain reference points
Multiple calculations on a dataframe based on certain reference points

Time:04-24

Hi I am a bit new to this, but I am looking to figure out how I can do this on R.

I am trying to divide a bunch of datasets by certain benchmark values, then taking the log() of it, but the data is very large and I am not sure how else to approach it other than using a for loop.

For example, I have a data like this:

Name Reference Lap1 Lap2 Lap3 Lap4 Lap5
Craig attempt1 34 21 33 21 32
Craig attempt2 29 28 29 30 29
Craig attempt3 25 25 24 21 26
Craig attempt4 20 21 21 22 24
Jeff attempt1 43 41 44 40 41
Jeff attempt2 38 38 37 36 35
Jeff attempt3 33 32 31 29 34
Jeff attempt4 29 27 26 25 27

I want to be able to divide each part of the attempt 1 in Craig, and divide it by the other Craig attempts, then take the log so as to use the first attempt as the reference for a comparison. But I also want to do this for each individual column, and with Jeff, so the end result becomes:

Name Reference Lap1 Lap2 Lap3 Lap4 Lap5
Craig attempt1 log(34/34) log(21/21) log(33/33) log(21/21) log(32/32)
Craig attempt2 log(29/34) log(28/21) log(29/33) log(30/21) log(29/32)
Craig attempt3 log(25/34) log(25/21) log(24/33) log(21/21) log(26/32)
Craig attempt4 log(20/34) log(21/21) log(21/33) log(22/21) log(24/32)
Jeff attempt1 43 41 44 40 41
Jeff attempt2 38 38 37 36 35
Jeff attempt3 33 32 31 29 34
Jeff attempt4 29 27 26 25 27

And I would do the same for Jeff, and his reference of attempt 1 for the other attempts as well. Please keep in mind that the number of columns would be a lot more, and that I would have more than an other person involved.

What would be the best way to get this calculation done?

I tried to add some starter code if it helps. I'm not good at it sorry.

row1 <- c("Name", "Reference", "Lap1", "Lap2", "Lap3", "Lap4", "Lap5")
row2 <- c("Craig", "attempt1", 34, 21, 33, 21, 32)
row3 <- c("Craig", "attempt2", 29, 28, 29, 30, 29)
row4 <- c("Craig", "attempt3", 25, 25, 24, 21, 26)
row5 <- c("Craig", "attempt4", 20, 21, 21, 22, 24)
row6 <- c("Jeff", "attempt1", 43, 41, 44, 40, 41)
row7 <- c("Jeff", "attempt2", 38, 38, 37, 36, 35)
row8 <- c("Jeff", "attempt3", 33, 32, 31, 29, 34)
row9 <- c("Jeff", "attempt4", 29, 27, 26, 25, 27)
df <- t(data.frame(row1, row2, row3, row4, row5, row6, row7, row8, row9))

CodePudding user response:

Here's an approach using dplyr::group_by to do the calculation separately for each Name, and dplyr::across to apply the calculation to all the columns that start with "lap." The funny bit at the end ~log(.x/first(.x)) means that for every column we specified, we want to apply a formula that takes the value (.x) and divides it by the first value in the group (first(.x)) and then takes the log of that ratio.

library(dplyr)
df %>%
  group_by(Name) %>%
  mutate(across(starts_with("lap"), ~log(.x/first(.x)))) %>%
  ungroup()

Alternatively, if your data is not already sorted with attempt1 first for each Name, you could swap in this line:

...
mutate(across(starts_with("lap"), ~log(.x/.x[Reference == "attempt1"]))) %>%
...

or if the columns you want to operate on have other names, but you know which column #s they are (or aren't), you could calculate across them based on position:

mutate(across(-(1:2), ~log(.x/first(.x)))) %>%    

Result

# A tibble: 8 × 7
  Name  Reference   Lap1    Lap2   Lap3    Lap4    Lap5
  <chr> <chr>      <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
1 Craig attempt1   0      0       0      0       0     
2 Craig attempt2  -0.159  0.288  -0.129  0.357  -0.0984  # -0.159 = ln(29/34)
3 Craig attempt3  -0.307  0.174  -0.318  0      -0.208 
4 Craig attempt4  -0.531  0      -0.452  0.0465 -0.288 
5 Jeff  attempt1   0      0       0      0       0     
6 Jeff  attempt2  -0.124 -0.0760 -0.173 -0.105  -0.158 
7 Jeff  attempt3  -0.265 -0.248  -0.350 -0.322  -0.187 
8 Jeff  attempt4  -0.394 -0.418  -0.526 -0.470  -0.418

Your sample data is not a standard data frame and will be easier to work with if you specify the columns and not the rows.

df <- data.frame(
  stringsAsFactors = FALSE,
              Name = c("Craig","Craig","Craig",
                       "Craig","Jeff","Jeff","Jeff","Jeff"),
         Reference = c("attempt1","attempt2",
                       "attempt3","attempt4","attempt1","attempt2","attempt3",
                       "attempt4"),
              Lap1 = c(34L, 29L, 25L, 20L, 43L, 38L, 33L, 29L),
              Lap2 = c(21L, 28L, 25L, 21L, 41L, 38L, 32L, 27L),
              Lap3 = c(33L, 29L, 24L, 21L, 44L, 37L, 31L, 26L),
              Lap4 = c(21L, 30L, 21L, 22L, 40L, 36L, 29L, 25L),
              Lap5 = c(32L, 29L, 26L, 24L, 41L, 35L, 34L, 27L)
) 
  • Related