I have a dataset with dead bird records from field observers.
Death.Date Observer Species Bird.ID
1 03/08/2021 DA MF FC10682
2 15/08/2021 AG MF FC10698
3 12/01/2022 DA MF FC20957
4 09/02/2022 DA MF FC10708
I want to produce a dataset from this with the number of unique Bird.ID / Month so I can produce a graph from that. ("unique" because some people make mistakes and enter a bird twice sometimes).
The output in this case would be:
Month Number of dead
08/2021 2
01/2022 1
02/2022 1
The idea is to use the distinct
function but by month (knowing the value is in date format dd/mm/yyyy).
CodePudding user response:
In case your Date column is character type first transform to date type with
dmy
Change format to month and year
group_by
andsummarize
library(dplyr)
library(lubridate) # in case your Date is in character format
df %>%
mutate(Death.Date = dmy(Death.Date)) %>% # you may not need this line
mutate(Month = format(as.Date(Death.Date), "%m/%Y")) %>%
group_by(Month) %>%
summarise(`Number of dead`=n())
Month `Number of dead`
<chr> <int>
1 01/2022 1
2 02/2022 1
3 08/2021 2
CodePudding user response:
data.table solution
library(data.table)
library(lubridate)
# Reproductible example with a duplicated bird
deadbirds <- data.table::data.table(Death.Date = c("03/08/2021", "15/08/2021", "12/01/2022", "09/02/2022", "03/08/2021"),
Observer = c("DA", "AG", "DA", "DA", "DA"),
Species = c("MF", "MF", "MF" , "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708", "FC10682"))
# Clean dataset = option 1 : delete all duplicated row
deadbirds <- base::unique(deadbirds)
# Clean dataset = option 2 : keep only the first line by bird (can be useful when there is duplicated data with differents values in useless columns)
deadbirds <- deadbirds[
j = .SD[1],
by = c("Bird.ID")
]
# Death.Date as date
deadbirds <- deadbirds[
j = Death.Date := lubridate::dmy(Death.Date)
]
# Create month.Death.Date
deadbirds <- deadbirds[
j = month.Death.Date := base::paste0(lubridate::month(Death.Date),
"/",
lubridate::year(Death.Date))
]
# Count by month
deadbirds <- deadbirds[
j = `Number of dead` := .N,
by = month.Death.Date]
CodePudding user response:
A possible solution, based on tidyverse
, lubridate
and zoo::as.yearmon
:
library(tidyverse)
library(lubridate)
library(zoo)
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
df %>%
group_by(date = as.yearmon(dmy(Death.Date))) %>%
summarise(nDead = n_distinct(Bird.ID), .groups = "drop")
#> # A tibble: 3 x 2
#> date nDead
#> <yearmon> <int>
#> 1 Aug 2021 2
#> 2 Jan 2022 1
#> 3 Feb 2022 1
CodePudding user response:
For completeness, this can be achieved using aggregate
without any additional packages:
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
aggregate.data.frame(
x = df["Bird.ID"],
by = list(death_month = format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y")),
FUN = function(x) {length(unique(x))}
)
Notes
- The anonymous function
function(x) {length(unique(x))
provides the count of the unique values format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y"))
call ensures that the month/Year string is provided
CodePudding user response:
You could use:
as.data.frame(table(format(as.Date(df$Death.Date,'%d/%m/%Y'), '%m/%Y')))
# Var1 Freq
# 1 01/2022 1
# 2 02/2022 1
# 3 08/2021 2
data:
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)