Home > other >  How to calculate difference in values grouped by 2 separate variables in R
How to calculate difference in values grouped by 2 separate variables in R

Time:08-30

Let's say we have a team variable, but we also have a time period 1 and a time period 2 variable, and a numeric grade 1-10. I want to mutate and add a variable that calculates the difference from time period 1 to time period 2.

How do I do this?

Visually the table looks like this: img

CodePudding user response:

There is a neat function in the data.table package called dcast( ) that allows you to transform your data from long to wide. In this case, you can use the Period variable to create 2 new columns, Period 1 and Period 2, where the values are the Grades.

library(data.table)

> data <- data.table(
    Team = c("Team 1","Team 1","Team 2","Team 2","Team 3","Team 3"),
    Period = c("Period 1","Period 2","Period 1","Period 2","Period 1","Period 2"),
    Grade = c(75,87,42,35,10,95))
 
> data
     Team   Period Grade
1: Team 1 Period 1    75
2: Team 1 Period 2    87
3: Team 2 Period 1    42
4: Team 2 Period 2    35
5: Team 3 Period 1    10
6: Team 3 Period 2    95
 
> data2 <- dcast(
    data = data,
    Team ~ Period,
    value.var = "Grade")
 
> data2
     Team Period 1 Period 2
1: Team 1       75       87
2: Team 2       42       35
3: Team 3       10       95

> data2 <- data2[,Difference := `Period 2` - `Period 1`]
 
> data2
     Team Period 1 Period 2 Difference
1: Team 1       75       87         12
2: Team 2       42       35         -7
3: Team 3       10       95         85

CodePudding user response:

In tidyverse syntax, we would use pivot_wider and mutate:

library(tidyverse)

df %>%
  pivot_wider(names_from = `Time Period`, values_from = Grade) %>%
  mutate(difference = P2 - P1)
#> # A tibble: 3 x 4
#>   Team      P1    P2 difference
#>   <chr>  <dbl> <dbl>      <dbl>
#> 1 Team 1    75    87         12
#> 2 Team 2    42    35         -7
#> 3 Team 3    10    95         85

Created on 2022-08-29 with reprex v2.0.2


Data used

df <- data.frame(Team = paste("Team", rep(1:3, each = 2)),
                 `Time Period` = rep(c("P1", "P2"), 3),
                 Grade = c(75, 87, 42, 35, 10, 95),
                 check.names = FALSE)

df
#>     Team Time Period Grade
#> 1 Team 1          P1    75
#> 2 Team 1          P2    87
#> 3 Team 2          P1    42
#> 4 Team 2          P2    35
#> 5 Team 3          P1    10
#> 6 Team 3          P2    95
  • Related