I have a long list that needs the days between dates
ClientID <- c("00932", "00932", "00932")
Visit <- c("2018-11-10", "2018-11-20", "2018-11-25")
ClientID Visit
00932 2018-11-10
00932 2018-11-20
00932 2018-11-25
I need a new column that says
ClientID Visit Days
00932 2018-11-10 0
00932 2018-11-20 10
00932 2018-11-25 15
CodePudding user response:
Change the Visit
to date class and for each ClientID
subtract the Visit
with the minimum Visit
date.
library(dplyr)
df %>%
mutate(Visit = as.Date(Visit, '%m-%d-%Y')) %>%
group_by(ClientID) %>%
mutate(Days = as.integer(Visit - min(Visit))) %>%
ungroup
# ClientID Visit Days
# <chr> <date> <int>
#1 00932 2018-11-10 0
#2 00932 2018-11-20 10
#3 00932 2018-11-25 15
data
ClientID <- c("00932", "00932", "00932")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018")
df <- data.frame(ClientID, Visit)
CodePudding user response:
As I assume that you need the day span between dates, not between the date and latest date I propose this:
dn <- as.numeric(as.Date(Visit))
The text with dates is consistently converted to date and to number.
dn2 <- c(dn[1], dn[-length(dn)])
We prepare the second vector for subtraction as it will work fast. It will have the order of memebers like this:
df df2
1st 1st
2nd 1st
3rd 2nd
nth nth-1
Days <- dn - dn2
Find the span.
CodePudding user response:
Considering you have multiple ClientID
and want to calculate the Days
on that level:
library(lubridate)
library(tidyverse)
ClientID <- c("00932", "00932", "00932")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018")
df <- data.frame(ClientID, Visit)
df %>%
group_by(ClientID) %>%
mutate(Visit= mdy(Visit),
Days= as.numeric(Visit-lag(Visit)))%>%
ungroup()%>%
mutate_if(is.numeric, ~replace_na(., 0))
# A tibble: 3 x 3
ClientID Visit Days
<chr> <date> <dbl>
1 00932 2018-11-10 0
2 00932 2018-11-20 10
3 00932 2018-11-25 5
Adding another ClientID
and two observations to demonstrate it better:
ClientID <- c("00932", "00932", "00932", "00935", "00935")
Visit <- c("11-10-2018", "11-20-2018", "11-25-2018", "11-20-2019", "11-25-2019")
df <- data.frame(ClientID, Visit)
df %>%
group_by(ClientID) %>%
mutate(Visit= mdy(Visit),
Days= as.numeric(Visit-lag(Visit)))%>%
ungroup()%>%
mutate_if(is.numeric, ~replace_na(., 0))
# A tibble: 5 x 3
ClientID Visit Days
<chr> <date> <dbl>
1 00932 2018-11-10 0
2 00932 2018-11-20 10
3 00932 2018-11-25 5
4 00935 2019-11-20 0
5 00935 2019-11-25 5