I have this dataset:
library(dplyr)
library(lubridate)
id <- c("A", "A", "B", "B")
date <- ymd_hms(c("2017-12-26 09:01:30", "2018-01-01 09:06:40", "2017-12-30 09:04:50", "2018-02-02 09:01:00"))
df <- tibble(id, date)
I need to creating a new column with the year, but the year of first date of that ID, because the data are from the end of the year, so generally the dates vary between 2 years for the same ID.
I tried like this initially but it's didnt work:
df %>% group_by(id) %>%
mutate(year=paste0(year(date) > min(year(date))))
Here is expected output:
>output
id date year
1 A 2017-12-26 09:01:30 2017
2 A 2018-01-01 09:06:40 2017
2 B 2017-12-30 09:04:50 2017
2 B 2018-02-02 09:01:00 2017
CodePudding user response:
A possible solution:
library(tidyverse)
library(lubridate)
df %>%
group_by(id) %>%
mutate(year = first(ymd_hms(date) %>% year)) %>%
ungroup
#> # A tibble: 4 × 3
#> id date year
#> <chr> <dttm> <dbl>
#> 1 A 2017-12-26 09:01:30 2017
#> 2 A 2018-01-01 09:06:40 2017
#> 3 B 2017-12-30 09:04:50 2017
#> 4 B 2018-02-02 09:01:00 2017
CodePudding user response:
Using base R
transform(df, year = format(ave(date, id, FUN = min), '%Y'))
id date year
1 A 2017-12-26 09:01:30 2017
2 A 2018-01-01 09:06:40 2017
3 B 2017-12-30 09:04:50 2017
4 B 2018-02-02 09:01:00 2017