Hi I am a bit new to this, but I am looking to figure out how I can do this on R.
I am trying to divide a bunch of datasets by certain benchmark values, then taking the log() of it, but the data is very large and I am not sure how else to approach it other than using a for loop.
For example, I have a data like this:
Name | Reference | Lap1 | Lap2 | Lap3 | Lap4 | Lap5 |
---|---|---|---|---|---|---|
Craig | attempt1 | 34 | 21 | 33 | 21 | 32 |
Craig | attempt2 | 29 | 28 | 29 | 30 | 29 |
Craig | attempt3 | 25 | 25 | 24 | 21 | 26 |
Craig | attempt4 | 20 | 21 | 21 | 22 | 24 |
Jeff | attempt1 | 43 | 41 | 44 | 40 | 41 |
Jeff | attempt2 | 38 | 38 | 37 | 36 | 35 |
Jeff | attempt3 | 33 | 32 | 31 | 29 | 34 |
Jeff | attempt4 | 29 | 27 | 26 | 25 | 27 |
I want to be able to divide each part of the attempt 1 in Craig, and divide it by the other Craig attempts, then take the log so as to use the first attempt as the reference for a comparison. But I also want to do this for each individual column, and with Jeff, so the end result becomes:
Name | Reference | Lap1 | Lap2 | Lap3 | Lap4 | Lap5 |
---|---|---|---|---|---|---|
Craig | attempt1 | log(34/34) | log(21/21) | log(33/33) | log(21/21) | log(32/32) |
Craig | attempt2 | log(29/34) | log(28/21) | log(29/33) | log(30/21) | log(29/32) |
Craig | attempt3 | log(25/34) | log(25/21) | log(24/33) | log(21/21) | log(26/32) |
Craig | attempt4 | log(20/34) | log(21/21) | log(21/33) | log(22/21) | log(24/32) |
Jeff | attempt1 | 43 | 41 | 44 | 40 | 41 |
Jeff | attempt2 | 38 | 38 | 37 | 36 | 35 |
Jeff | attempt3 | 33 | 32 | 31 | 29 | 34 |
Jeff | attempt4 | 29 | 27 | 26 | 25 | 27 |
And I would do the same for Jeff, and his reference of attempt 1 for the other attempts as well. Please keep in mind that the number of columns would be a lot more, and that I would have more than an other person involved.
What would be the best way to get this calculation done?
I tried to add some starter code if it helps. I'm not good at it sorry.
row1 <- c("Name", "Reference", "Lap1", "Lap2", "Lap3", "Lap4", "Lap5")
row2 <- c("Craig", "attempt1", 34, 21, 33, 21, 32)
row3 <- c("Craig", "attempt2", 29, 28, 29, 30, 29)
row4 <- c("Craig", "attempt3", 25, 25, 24, 21, 26)
row5 <- c("Craig", "attempt4", 20, 21, 21, 22, 24)
row6 <- c("Jeff", "attempt1", 43, 41, 44, 40, 41)
row7 <- c("Jeff", "attempt2", 38, 38, 37, 36, 35)
row8 <- c("Jeff", "attempt3", 33, 32, 31, 29, 34)
row9 <- c("Jeff", "attempt4", 29, 27, 26, 25, 27)
df <- t(data.frame(row1, row2, row3, row4, row5, row6, row7, row8, row9))
CodePudding user response:
Here's an approach using dplyr::group_by
to do the calculation separately for each Name
, and dplyr::across
to apply the calculation to all the columns that start with "lap." The funny bit at the end ~log(.x/first(.x))
means that for every column we specified, we want to apply a formula that takes the value (.x
) and divides it by the first value in the group (first(.x)
) and then takes the log of that ratio.
library(dplyr)
df %>%
group_by(Name) %>%
mutate(across(starts_with("lap"), ~log(.x/first(.x)))) %>%
ungroup()
Alternatively, if your data is not already sorted with attempt1
first for each Name, you could swap in this line:
...
mutate(across(starts_with("lap"), ~log(.x/.x[Reference == "attempt1"]))) %>%
...
or if the columns you want to operate on have other names, but you know which column #s they are (or aren't), you could calculate across them based on position:
mutate(across(-(1:2), ~log(.x/first(.x)))) %>%
Result
# A tibble: 8 × 7
Name Reference Lap1 Lap2 Lap3 Lap4 Lap5
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Craig attempt1 0 0 0 0 0
2 Craig attempt2 -0.159 0.288 -0.129 0.357 -0.0984 # -0.159 = ln(29/34)
3 Craig attempt3 -0.307 0.174 -0.318 0 -0.208
4 Craig attempt4 -0.531 0 -0.452 0.0465 -0.288
5 Jeff attempt1 0 0 0 0 0
6 Jeff attempt2 -0.124 -0.0760 -0.173 -0.105 -0.158
7 Jeff attempt3 -0.265 -0.248 -0.350 -0.322 -0.187
8 Jeff attempt4 -0.394 -0.418 -0.526 -0.470 -0.418
Your sample data is not a standard data frame and will be easier to work with if you specify the columns and not the rows.
df <- data.frame(
stringsAsFactors = FALSE,
Name = c("Craig","Craig","Craig",
"Craig","Jeff","Jeff","Jeff","Jeff"),
Reference = c("attempt1","attempt2",
"attempt3","attempt4","attempt1","attempt2","attempt3",
"attempt4"),
Lap1 = c(34L, 29L, 25L, 20L, 43L, 38L, 33L, 29L),
Lap2 = c(21L, 28L, 25L, 21L, 41L, 38L, 32L, 27L),
Lap3 = c(33L, 29L, 24L, 21L, 44L, 37L, 31L, 26L),
Lap4 = c(21L, 30L, 21L, 22L, 40L, 36L, 29L, 25L),
Lap5 = c(32L, 29L, 26L, 24L, 41L, 35L, 34L, 27L)
)