Home > Software design >  Group_by and summarise to substract values from different rows conditional to other value
Group_by and summarise to substract values from different rows conditional to other value

Time:10-02

I am trying to create a variable that would be the difference the values of a variable depending on the value of another variable. More precisely, trying to transform

        patID  time     value
1       10    start        50
2       10      end        60
3       11    start        20
4       11      end        35

into

        patID   diff
1       10        10
2       11        15

by substracting the end Value from the start Value for each patID. This is what I've come up with so far:

df %>%
mutate(start=if_else(time=='start',value,0),
         end=if_else(time=='end',value,0)) 
  group_by(patID) %>%
  summarise(start=max(start), end=max(end)) 
  mutate(diff=end-start)

It works, but I feel it's bulky and I have a feeling I'm not grasping the full potential of group_by and summarise functions. I'd love to have a more concise way of doing this. Any suggestions?

CodePudding user response:

You can do this directly using the diff() function:

df %>% group_by(patID) %>% summarize(diff=diff(value))

Output:

  patID  diff
  <int> <int>
1    10    10
2    11    15

This assumes that each patID has two rows, a start row and and end row, and they are ordered in that fashion (start row first, end row second)

CodePudding user response:

Here is a simpler one. Just notice that alphabetically end comes before start and sort them after grouping by patID.

df1 <- '        patID  time     value
1       10    start        50
2       10      end        60
3       11    start        20
4       11      end        35'
df1 <- read.table(textConnection(df1), header = TRUE)

suppressPackageStartupMessages(
  library(dplyr)
)

df1 %>%
  group_by(patID) %>%
  arrange(desc(time)) %>%
  summarise(diff = diff(value))
#> # A tibble: 2 × 2
#>   patID  diff
#>   <int> <int>
#> 1    10    10
#> 2    11    15

Created on 2022-10-01 with reprex v2.0.2

CodePudding user response:

We can use dplyr::pivot_wider if ever you have multiple factors on a specific variable.

df2 <- df %>% 
  dplyr::group_by(patID) %>%
  tidyr::pivot_wider(names_from = time, values_from = value) %>% 
  dplyr::summarise(diff = start - end)

you can substitute start and end as the factors you want to subtract.

  • Related