Home > front end >  Count unique values per month in R
Count unique values per month in R

Time:03-09

I have a dataset with dead bird records from field observers.

Death.Date Observer Species Bird.ID
1 03/08/2021       DA      MF FC10682
2 15/08/2021       AG      MF FC10698
3 12/01/2022       DA      MF FC20957
4 09/02/2022       DA      MF FC10708

I want to produce a dataset from this with the number of unique Bird.ID / Month so I can produce a graph from that. ("unique" because some people make mistakes and enter a bird twice sometimes).

The output in this case would be:

Month Number of dead 
08/2021 2
01/2022 1
02/2022 1

The idea is to use the distinct function but by month (knowing the value is in date format dd/mm/yyyy).

CodePudding user response:

  1. In case your Date column is character type first transform to date type with dmy

  2. Change format to month and year

  3. group_by and summarize

library(dplyr)
library(lubridate) # in case your Date is in character format

df %>% 
  mutate(Death.Date = dmy(Death.Date)) %>% # you may not need this line
  mutate(Month = format(as.Date(Death.Date), "%m/%Y")) %>%
  group_by(Month) %>% 
  summarise(`Number of dead`=n())
 Month   `Number of dead`
  <chr>              <int>
1 01/2022                1
2 02/2022                1
3 08/2021                2

CodePudding user response:

data.table solution

library(data.table)
library(lubridate)

# Reproductible example with a duplicated bird
deadbirds <- data.table::data.table(Death.Date = c("03/08/2021", "15/08/2021", "12/01/2022", "09/02/2022", "03/08/2021"),
                                    Observer   = c("DA", "AG", "DA", "DA", "DA"),
                                    Species    = c("MF", "MF", "MF" , "MF", "MF"),
                                    Bird.ID    = c("FC10682", "FC10698", "FC20957", "FC10708", "FC10682"))

# Clean dataset = option 1 : delete all duplicated row
deadbirds <- base::unique(deadbirds)

# Clean dataset = option 2 : keep only the first line by bird (can be useful when there is duplicated data with differents values in useless columns)
deadbirds <- deadbirds[
  j = .SD[1],
  by = c("Bird.ID")
]

# Death.Date as date
deadbirds <- deadbirds[
  j = Death.Date := lubridate::dmy(Death.Date)
]

# Create month.Death.Date
deadbirds <- deadbirds[
  j = month.Death.Date := base::paste0(lubridate::month(Death.Date),
                                       "/",
                                       lubridate::year(Death.Date))
]

# Count by month
deadbirds <- deadbirds[
  j  = `Number of dead` := .N,
  by = month.Death.Date]

CodePudding user response:

A possible solution, based on tidyverse, lubridate and zoo::as.yearmon:

library(tidyverse)
library(lubridate)
library(zoo)

df <- data.frame(
  Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
  Observer = c("DA", "AG", "DA", "DA"),
  Species = c("MF", "MF", "MF", "MF"),
  Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)

df %>% 
  group_by(date = as.yearmon(dmy(Death.Date))) %>% 
  summarise(nDead = n_distinct(Bird.ID), .groups = "drop")

#> # A tibble: 3 x 2
#>   date      nDead
#>   <yearmon> <int>
#> 1 Aug 2021      2
#> 2 Jan 2022      1
#> 3 Feb 2022      1

CodePudding user response:

For completeness, this can be achieved using aggregate without any additional packages:

df <- data.frame(
  Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
  Observer = c("DA", "AG", "DA", "DA"),
  Species = c("MF", "MF", "MF", "MF"),
  Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)


aggregate.data.frame(
  x = df["Bird.ID"],
  by = list(death_month = format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y")),
  FUN = function(x) {length(unique(x))}
)

Notes

  • The anonymous function function(x) {length(unique(x)) provides the count of the unique values
  • format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y")) call ensures that the month/Year string is provided

CodePudding user response:

You could use:

as.data.frame(table(format(as.Date(df$Death.Date,'%d/%m/%Y'), '%m/%Y')))

#       Var1 Freq
# 1 01/2022    1
# 2 02/2022    1
# 3 08/2021    2

data:

df <- data.frame(
  Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
  Observer = c("DA", "AG", "DA", "DA"),
  Species = c("MF", "MF", "MF", "MF"),
  Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
  • Related