R: How do I add a column with day number based on a column with dates-CodePudding

My problem is to add a column to a dataframe with day number based on another column with dates. The earliest date for each id is supposed to be numbered with 0 and all subsequent days with the difference from day 0, grouped by id.

I have made several unsuccessful attempts using tidyverse functions and seen threads with people writing complicated functions to create this type of data. But I would expect my desired output to be quite straightforward?

Input:

id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-02-2010  
2   09-02-2010

Desired output:

id  date          day
1   01-02-2010    0
1   02-02-2010    1
1   03-02-2010    2
2   07-02-2010    0
2   08-02-2010    1
2   09-02-2010    2

CodePudding user response：

library(dplyr)

df %>% 
  mutate(date = lubridate::dmy(date)) %>% 
  group_by(id) %>% 
  mutate(day = date - first(date))

#> # A tibble: 6 x 3
#> # Groups:   id [2]
#>      id date       day     
#>   <int> <date>     <drtn>  
#> 1     1 2010-02-01   0 days
#> 2     1 2010-02-02   1 days
#> 3     1 2010-02-03   2 days
#> 4     2 2010-02-07   0 days
#> 5     2 2010-08-08 182 days
#> 6     2 2010-02-09   2 days

With data:

df <- read.table(text = 
'id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-08-2010  
2   09-02-2010 ', header = TRUE)

CodePudding user response：

Here's a solution using lubridate (which is not in tidyverse, but works well with it):

library(tidyverse)
df <- read_table("id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-02-2010  
2   09-02-2010") %>% 
    select(-X3)
#> Warning: Missing column names filled in: 'X3' [3]
#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   6  -- 3 columns 2 columns literal data

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
df %>%
    group_by(id) %>%
    mutate(
        date = dmy(date),
        day = as.numeric(date - min(date)))
#> # A tibble: 6 × 3
#> # Groups:   id [2]
#>      id date         day
#>   <dbl> <date>     <dbl>
#> 1     1 2010-02-01     0
#> 2     1 2010-02-02     1
#> 3     1 2010-02-03     2
#> 4     2 2010-02-07     0
#> 5     2 2010-02-08     1
#> 6     2 2010-02-09     2

^{Created on 2021-11-02 by the reprex package (v2.0.1)}

CodePudding user response：

Using lubridate and dplyr:

library(dplyr)
library(lubridate)

^{P.S. Don't need to import the packages if they're already imported.}

df %>% group_by(id) %>%
   mutate(day = day(dmy(date)) - day(dmy(first(date))))

Output:

# A tibble: 6 x 3
# Groups:   id [2]
     id date         day
  <dbl> <chr>      <dbl>
1     1 01-02-2010     0
2     1 02-02-2010     1
3     1 03-02-2010     2
4     2 07-02-2010     0
5     2 08-08-2010     1
6     2 09-02-2010     2