My problem is to add a column to a dataframe with day number based on another column with dates. The earliest date for each id is supposed to be numbered with 0 and all subsequent days with the difference from day 0, grouped by id.
I have made several unsuccessful attempts using tidyverse functions and seen threads with people writing complicated functions to create this type of data. But I would expect my desired output to be quite straightforward?
Input:
id date
1 01-02-2010
1 02-02-2010
1 03-02-2010
2 07-02-2010
2 08-02-2010
2 09-02-2010
Desired output:
id date day
1 01-02-2010 0
1 02-02-2010 1
1 03-02-2010 2
2 07-02-2010 0
2 08-02-2010 1
2 09-02-2010 2
CodePudding user response:
library(dplyr)
df %>%
mutate(date = lubridate::dmy(date)) %>%
group_by(id) %>%
mutate(day = date - first(date))
#> # A tibble: 6 x 3
#> # Groups: id [2]
#> id date day
#> <int> <date> <drtn>
#> 1 1 2010-02-01 0 days
#> 2 1 2010-02-02 1 days
#> 3 1 2010-02-03 2 days
#> 4 2 2010-02-07 0 days
#> 5 2 2010-08-08 182 days
#> 6 2 2010-02-09 2 days
With data:
df <- read.table(text =
'id date
1 01-02-2010
1 02-02-2010
1 03-02-2010
2 07-02-2010
2 08-08-2010
2 09-02-2010 ', header = TRUE)
CodePudding user response:
Here's a solution using lubridate
(which is not in tidyverse, but works well with it):
library(tidyverse)
df <- read_table("id date
1 01-02-2010
1 02-02-2010
1 03-02-2010
2 07-02-2010
2 08-02-2010
2 09-02-2010") %>%
select(-X3)
#> Warning: Missing column names filled in: 'X3' [3]
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 6 -- 3 columns 2 columns literal data
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
df %>%
group_by(id) %>%
mutate(
date = dmy(date),
day = as.numeric(date - min(date)))
#> # A tibble: 6 × 3
#> # Groups: id [2]
#> id date day
#> <dbl> <date> <dbl>
#> 1 1 2010-02-01 0
#> 2 1 2010-02-02 1
#> 3 1 2010-02-03 2
#> 4 2 2010-02-07 0
#> 5 2 2010-02-08 1
#> 6 2 2010-02-09 2
Created on 2021-11-02 by the reprex package (v2.0.1)
CodePudding user response:
Using lubridate
and dplyr
:
library(dplyr)
library(lubridate)
P.S. Don't need to import the packages if they're already imported.
df %>% group_by(id) %>%
mutate(day = day(dmy(date)) - day(dmy(first(date))))
Output:
# A tibble: 6 x 3
# Groups: id [2]
id date day
<dbl> <chr> <dbl>
1 1 01-02-2010 0
2 1 02-02-2010 1
3 1 03-02-2010 2
4 2 07-02-2010 0
5 2 08-08-2010 1
6 2 09-02-2010 2