I am trying to create a variable that would be the difference the values of a variable depending on the value of another variable. More precisely, trying to transform
patID time value
1 10 start 50
2 10 end 60
3 11 start 20
4 11 end 35
into
patID diff
1 10 10
2 11 15
by substracting the end Value from the start Value for each patID. This is what I've come up with so far:
df %>%
mutate(start=if_else(time=='start',value,0),
end=if_else(time=='end',value,0))
group_by(patID) %>%
summarise(start=max(start), end=max(end))
mutate(diff=end-start)
It works, but I feel it's bulky and I have a feeling I'm not grasping the full potential of group_by and summarise functions. I'd love to have a more concise way of doing this. Any suggestions?
CodePudding user response:
You can do this directly using the diff()
function:
df %>% group_by(patID) %>% summarize(diff=diff(value))
Output:
patID diff
<int> <int>
1 10 10
2 11 15
This assumes that each patID
has two rows, a start row and and end row, and they are ordered in that fashion (start row first, end row second)
CodePudding user response:
Here is a simpler one. Just notice that alphabetically end
comes before start
and sort them after grouping by patID
.
df1 <- ' patID time value
1 10 start 50
2 10 end 60
3 11 start 20
4 11 end 35'
df1 <- read.table(textConnection(df1), header = TRUE)
suppressPackageStartupMessages(
library(dplyr)
)
df1 %>%
group_by(patID) %>%
arrange(desc(time)) %>%
summarise(diff = diff(value))
#> # A tibble: 2 × 2
#> patID diff
#> <int> <int>
#> 1 10 10
#> 2 11 15
Created on 2022-10-01 with reprex v2.0.2
CodePudding user response:
We can use dplyr::pivot_wider
if ever you have multiple factors on a specific variable.
df2 <- df %>%
dplyr::group_by(patID) %>%
tidyr::pivot_wider(names_from = time, values_from = value) %>%
dplyr::summarise(diff = start - end)
you can substitute start
and end
as the factors you want to subtract.