Been looking for a while without finding answers so try here:
I have a group of data in a column where the first observation of an animal is listed. 2022-05-03. 2022-05-01. 2022-04-23, 2021-05-04, 2021-02-31, 2020-01-30, 2020-05-20 and so on.
I am looking for finding the first observation per year using the filter() function. How is that supposed to like, is the lubridate function something to apply?
Thanks in advance.
CodePudding user response:
Yoy can try:
library(dplyr)
library(lubridate)
df = tibble(date = as.Date(c("2022-05-03", "2022-05-01", "2022-04-23", "2021-05-04", "2021-02-28", "2020-01-30", "2020-05-20")))
Then, to get the first date by year:
df %>% mutate(year = year(date)) %>% arrange(date) %>% group_by(year) %>% slice(1)
Best wishes!
CodePudding user response:
I show you some ways First of all, use "Date"
format for dates!
animal_data <- transform(animal_data, date=as.Date(date))
Here an option using aggregate
with formula interface, aggregating at animal name and 1-4 substr
ings of the date, i.e. the year,
aggregate(date ~ animals substr(date, 1, 4), animal_data, min)
# animals substr(date, 1, 4) date
# 1 Gorilla 2020 2020-07-05
# 2 Rhebok 2020 2020-02-22
# 3 Vicuna 2020 2020-06-23
# 4 Gorilla 2021 2021-01-11
# 5 Rhebok 2021 2021-03-10
# 6 Vicuna 2021 2021-05-24
# 7 Gorilla 2022 2022-05-03
# 8 Rhebok 2022 2022-04-29
or with list notation, where we are most flexible regarding the column names of the result.
with(animal_data, aggregate(list(date=date), list(animals=animals, year=substr(date, 1, 4)), min))
# animals year date
# 1 Gorilla 2020 2020-07-05
# 2 Rhebok 2020 2020-02-22
# 3 Vicuna 2020 2020-06-23
# 4 Gorilla 2021 2021-01-11
# 5 Rhebok 2021 2021-03-10
# 6 Vicuna 2021 2021-05-24
# 7 Gorilla 2022 2022-05-03
# 8 Rhebok 2022 2022-04-29
Another way is using ave
in subset
. subset
expects a logical condition. ave
internally splits the date at animal then at the year and applies which.max
on this subset. We compare the output of ave
—the first obs of the animal in that year—with the date and in this way create the logical subset.
subset(animal_data, ave(date, animals, substr(date, 1, 4), FUN=\(x) x[which.min(x)]) == date)
# animals date
# 1 Rhebok 2020-02-22
# 2 Vicuna 2020-06-23
# 3 Gorilla 2020-07-05
# 12 Gorilla 2021-01-11
# 13 Rhebok 2021-03-10
# 14 Vicuna 2021-05-24
# 19 Rhebok 2022-04-29
# 20 Gorilla 2022-05-03
Now you probably have a few options to choose from.
Data:
animal_data <- structure(list(animals = c("Rhebok", "Vicuna", "Gorilla", "Rhebok",
"Rhebok", "Gorilla", "Rhebok", "Vicuna", "Vicuna", "Gorilla",
"Vicuna", "Gorilla", "Rhebok", "Vicuna", "Rhebok", "Rhebok",
"Rhebok", "Vicuna", "Rhebok", "Gorilla"), date = structure(c(18314,
18436, 18448, 18487, 18502, 18516, 18549, 18582, 18588, 18589,
18604, 18638, 18696, 18771, 18806, 18807, 18911, 18938, 19111,
19115), class = "Date")), row.names = c(8L, 15L, 9L, 18L, 3L,
20L, 4L, 14L, 7L, 10L, 5L, 17L, 11L, 6L, 19L, 2L, 1L, 13L, 12L,
16L), class = "data.frame")