How to creating a new column with year of first date to each id in r-CodePudding

I have this dataset:

library(dplyr)
library(lubridate)

id <- c("A", "A", "B", "B")
date <- ymd_hms(c("2017-12-26 09:01:30", "2018-01-01 09:06:40", "2017-12-30 09:04:50", "2018-02-02 09:01:00"))
df <- tibble(id, date)

I need to creating a new column with the year, but the year of first date of that ID, because the data are from the end of the year, so generally the dates vary between 2 years for the same ID.

I tried like this initially but it's didnt work:


df %>% group_by(id) %>% 
  mutate(year=paste0(year(date) > min(year(date))))

Here is expected output:

>output
      id          date              year
1     A    2017-12-26 09:01:30      2017
2     A    2018-01-01 09:06:40      2017
2     B    2017-12-30 09:04:50      2017
2     B    2018-02-02 09:01:00      2017

CodePudding user response：

A possible solution:

library(tidyverse)
library(lubridate)

df %>% 
  group_by(id) %>% 
  mutate(year = first(ymd_hms(date) %>% year)) %>% 
  ungroup

#> # A tibble: 4 × 3
#>   id    date                 year
#>   <chr> <dttm>              <dbl>
#> 1 A     2017-12-26 09:01:30  2017
#> 2 A     2018-01-01 09:06:40  2017
#> 3 B     2017-12-30 09:04:50  2017
#> 4 B     2018-02-02 09:01:00  2017

CodePudding user response：

Using base R

transform(df, year = format(ave(date, id, FUN = min), '%Y'))
  id                date year
1  A 2017-12-26 09:01:30 2017
2  A 2018-01-01 09:06:40 2017
3  B 2017-12-30 09:04:50 2017
4  B 2018-02-02 09:01:00 2017