I have a data set that have id date and time, now I want to calculate the difference between each available date based on id. I have try to look for similar problem in stack overflow but so far no luck. I have try a few different syntax but still no luck at the moment. any help would be great.
data set:
> dput(mydata)
structure(list(id = c("a", "a", "b", "b", "b", "c"), date = c("2018-04-13",
"2011-11-12", "2019-05-30", "2014-09-13", "2019-06-21", "1998-01-08"
), time = c("50", "40", "30", "20", "10", "30")), class = "data.frame", row.names = c(NA,
-6L))
Desire output:
id date time time_diff
a 2018-04-13 50 10
a 2011-11-12 40 NA/0
b 2019-05-30 30 10
b 2014-09-13 20 NA/0
b 2019-06-21 10 -20
c 1998-01-08 30 NA/0
I understand the earliest date won't have anything to calculate the difference so it can be either NA or 0 in this case.
Here is the code that I have try but getting error:
mydata <- mydata %>%
group_by(id,date) %>%
mutate(time_diff = diff(time))
CodePudding user response:
library(dplyr)
df <- structure(list(
id = c("a", "a", "b", "b", "b", "c"),
date = ("2018-04-13", "2011-11-12", "2019-05-30", "2014-09-13", "2019-06-21", "1998-01-08"),
time = c("50", "40", "30", "20", "10", "30")),
class = "data.frame", row.names = c(NA, -6L))
df %>%
group_by(id) %>%
arrange(id, date) %>%
mutate(
time = as.numeric(time),
time_diff = time - lag(time)
)
CodePudding user response:
For each id
you may subtract the time
corresponding to minimum date
.
library(dplyr)
mydata %>%
mutate(time = as.numeric(time),
date = as.Date(date)) %>%
group_by(id) %>%
mutate(time_diff = time - time[which.min(date)]) %>%
ungroup
# id date time time_diff
# <chr> <date> <dbl> <dbl>
#1 a 2018-04-13 50 10
#2 a 2011-11-12 40 0
#3 b 2019-05-30 30 10
#4 b 2014-09-13 20 0
#5 b 2019-06-21 10 -10
#6 c 1998-01-08 30 0