I'm sure this is obvious, but I can't figure it out.
I have a data.frame, and want to calculate a variable from several types.
df = data.frame(time = rep(seq(10),each=2),Type=rep(c("A","B"),times=10),value = runif(20))
I want a new data.frame, with A / B for each time point.
I've tried:
df2 <- df |> group_by(time) |> mutate(new_value= value[Type=="A"] / value[Type=="B"],.keep="none")
But I still have a new_value twice for each time.
All help appreciated!
CodePudding user response:
A better option may be to reshape to 'wide' format with pivot_wider
and then create the column
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Type, values_from = value) %>%
mutate(new_value = A/B)
-output
# A tibble: 10 × 4
time A B new_value
<int> <dbl> <dbl> <dbl>
1 1 0.565 0.913 0.618
2 2 0.902 0.274 3.29
3 3 0.321 0.986 0.326
4 4 0.620 0.937 0.661
5 5 0.467 0.407 1.15
6 6 0.659 0.152 4.33
7 7 0.573 0.239 2.40
8 8 0.962 0.601 1.60
9 9 0.515 0.403 1.28
10 10 0.880 0.364 2.42
mutate
creates or modifies a column in the original dataset, thus it returns the same number of rows. Instead, it may be better to use summarise
if we want unique
values (but here the 'Type' will be lost)
df |>
group_by(time) |>
summarise(new_value= value[Type=="A"] / value[Type=="B"])
In addition, this works only when the count of 'A', 'B' elements per 'time' is the same