Home > OS >  Calculate variable from data.frame using dplyr
Calculate variable from data.frame using dplyr

Time:06-20

I'm sure this is obvious, but I can't figure it out.

I have a data.frame, and want to calculate a variable from several types.

df = data.frame(time = rep(seq(10),each=2),Type=rep(c("A","B"),times=10),value = runif(20))

I want a new data.frame, with A / B for each time point.

I've tried:

df2 <- df |> group_by(time) |> mutate(new_value= value[Type=="A"] / value[Type=="B"],.keep="none")

But I still have a new_value twice for each time.

All help appreciated!

CodePudding user response:

A better option may be to reshape to 'wide' format with pivot_wider and then create the column

library(dplyr)
library(tidyr)
df %>%
   pivot_wider(names_from = Type, values_from = value) %>% 
   mutate(new_value = A/B)

-output

# A tibble: 10 × 4
    time     A     B new_value
   <int> <dbl> <dbl>     <dbl>
 1     1 0.565 0.913     0.618
 2     2 0.902 0.274     3.29 
 3     3 0.321 0.986     0.326
 4     4 0.620 0.937     0.661
 5     5 0.467 0.407     1.15 
 6     6 0.659 0.152     4.33 
 7     7 0.573 0.239     2.40 
 8     8 0.962 0.601     1.60 
 9     9 0.515 0.403     1.28 
10    10 0.880 0.364     2.42 

mutate creates or modifies a column in the original dataset, thus it returns the same number of rows. Instead, it may be better to use summarise if we want unique values (but here the 'Type' will be lost)

df |> 
  group_by(time) |> 
  summarise(new_value= value[Type=="A"] / value[Type=="B"])

In addition, this works only when the count of 'A', 'B' elements per 'time' is the same

  • Related