Multiple calculations on a dataframe based on certain reference points-CodePudding

Hi I am a bit new to this, but I am looking to figure out how I can do this on R.

I am trying to divide a bunch of datasets by certain benchmark values, then taking the log() of it, but the data is very large and I am not sure how else to approach it other than using a for loop.

For example, I have a data like this:

Name	Reference	Lap1	Lap2	Lap3	Lap4	Lap5
Craig	attempt1	34	21	33	21	32
Craig	attempt2	29	28	29	30	29
Craig	attempt3	25	25	24	21	26
Craig	attempt4	20	21	21	22	24
Jeff	attempt1	43	41	44	40	41
Jeff	attempt2	38	38	37	36	35
Jeff	attempt3	33	32	31	29	34
Jeff	attempt4	29	27	26	25	27

I want to be able to divide each part of the attempt 1 in Craig, and divide it by the other Craig attempts, then take the log so as to use the first attempt as the reference for a comparison. But I also want to do this for each individual column, and with Jeff, so the end result becomes:

Name	Reference	Lap1	Lap2	Lap3	Lap4	Lap5
Craig	attempt1	log(34/34)	log(21/21)	log(33/33)	log(21/21)	log(32/32)
Craig	attempt2	log(29/34)	log(28/21)	log(29/33)	log(30/21)	log(29/32)
Craig	attempt3	log(25/34)	log(25/21)	log(24/33)	log(21/21)	log(26/32)
Craig	attempt4	log(20/34)	log(21/21)	log(21/33)	log(22/21)	log(24/32)
Jeff	attempt1	43	41	44	40	41
Jeff	attempt2	38	38	37	36	35
Jeff	attempt3	33	32	31	29	34
Jeff	attempt4	29	27	26	25	27

And I would do the same for Jeff, and his reference of attempt 1 for the other attempts as well. Please keep in mind that the number of columns would be a lot more, and that I would have more than an other person involved.

What would be the best way to get this calculation done?

I tried to add some starter code if it helps. I'm not good at it sorry.

row1 <- c("Name", "Reference", "Lap1", "Lap2", "Lap3", "Lap4", "Lap5")
row2 <- c("Craig", "attempt1", 34, 21, 33, 21, 32)
row3 <- c("Craig", "attempt2", 29, 28, 29, 30, 29)
row4 <- c("Craig", "attempt3", 25, 25, 24, 21, 26)
row5 <- c("Craig", "attempt4", 20, 21, 21, 22, 24)
row6 <- c("Jeff", "attempt1", 43, 41, 44, 40, 41)
row7 <- c("Jeff", "attempt2", 38, 38, 37, 36, 35)
row8 <- c("Jeff", "attempt3", 33, 32, 31, 29, 34)
row9 <- c("Jeff", "attempt4", 29, 27, 26, 25, 27)
df <- t(data.frame(row1, row2, row3, row4, row5, row6, row7, row8, row9))

CodePudding user response：

Here's an approach using dplyr::group_by to do the calculation separately for each Name, and dplyr::across to apply the calculation to all the columns that start with "lap." The funny bit at the end ~log(.x/first(.x)) means that for every column we specified, we want to apply a formula that takes the value (.x) and divides it by the first value in the group (first(.x)) and then takes the log of that ratio.

library(dplyr)
df %>%
  group_by(Name) %>%
  mutate(across(starts_with("lap"), ~log(.x/first(.x)))) %>%
  ungroup()

Alternatively, if your data is not already sorted with attempt1 first for each Name, you could swap in this line:

...
mutate(across(starts_with("lap"), ~log(.x/.x[Reference == "attempt1"]))) %>%
...

or if the columns you want to operate on have other names, but you know which column #s they are (or aren't), you could calculate across them based on position:

mutate(across(-(1:2), ~log(.x/first(.x)))) %>%

Result

# A tibble: 8 × 7
  Name  Reference   Lap1    Lap2   Lap3    Lap4    Lap5
  <chr> <chr>      <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
1 Craig attempt1   0      0       0      0       0     
2 Craig attempt2  -0.159  0.288  -0.129  0.357  -0.0984  # -0.159 = ln(29/34)
3 Craig attempt3  -0.307  0.174  -0.318  0      -0.208 
4 Craig attempt4  -0.531  0      -0.452  0.0465 -0.288 
5 Jeff  attempt1   0      0       0      0       0     
6 Jeff  attempt2  -0.124 -0.0760 -0.173 -0.105  -0.158 
7 Jeff  attempt3  -0.265 -0.248  -0.350 -0.322  -0.187 
8 Jeff  attempt4  -0.394 -0.418  -0.526 -0.470  -0.418

Your sample data is not a standard data frame and will be easier to work with if you specify the columns and not the rows.

df <- data.frame(
  stringsAsFactors = FALSE,
              Name = c("Craig","Craig","Craig",
                       "Craig","Jeff","Jeff","Jeff","Jeff"),
         Reference = c("attempt1","attempt2",
                       "attempt3","attempt4","attempt1","attempt2","attempt3",
                       "attempt4"),
              Lap1 = c(34L, 29L, 25L, 20L, 43L, 38L, 33L, 29L),
              Lap2 = c(21L, 28L, 25L, 21L, 41L, 38L, 32L, 27L),
              Lap3 = c(33L, 29L, 24L, 21L, 44L, 37L, 31L, 26L),
              Lap4 = c(21L, 30L, 21L, 22L, 40L, 36L, 29L, 25L),
              Lap5 = c(32L, 29L, 26L, 24L, 41L, 35L, 34L, 27L)
)