Home > Software engineering >  How to count the number of observations for a particular month/year in an event-level df?
How to count the number of observations for a particular month/year in an event-level df?

Time:09-22

I am working with an event-level df on protest patterns from 2000-2017. I want to create a new df that tells me the number of individual protest events for each month over this period. I have tried to do this with the following code but have been unsuccessful. There should thus be two columns: one listing the month/year and one listing the number of protests.

dates <- as.Date(dat$Date, '%m/%d/%Y')

yr <- year(dates)

monyr <- as.yearmon(dates)

lst <- lapply(list(dates, yr, monyr), function(x) 
  transform(dat, Count=ave(seq_along(x), x, FUN= length)))

names(lst) <- paste0('newdf', seq_along(lst))

Here is the data:

structure(list(Date = c("16-Mar-07", "19-Mar-07", "20-Mar-07", 
"21-Mar-07", "21-Mar-07", "22-Mar-07"), Region = c("Kemerovo", 
"Moscow City", "Saratov", "Novosibirsk", "Omsk", "Tyva")), row.names = c(NA, 
6L), class = "data.frame")

Thanks for your help.

CodePudding user response:

aggregate should work.

dat$month <- substr(dat$Date, 4, 9)
aggregate(dat, Region ~ month, length)

But you only have one month in your example data:

> aggregate(dat, Region ~ month, length)
   month Region
1 Mar-07      6

CodePudding user response:

There are the following problems:

  1. the format string does not correspond to the format of the dates in dat.
  2. if you wish to convert to yearmon it can be done directly. There is no need to convert to Date first.
  3. aggregate, table, xtabs or dplyr::count would be the appropriate function for aggregation
  4. the library statements are missing

Assuming that what is wanted is a data frame with a yearmon column, yrmon, and a count, Freq or n column we can make these changes resulting in the following code which produces a data frame with one row per year/month. The library statement in the first line of code below is needed for all solutions.

library(zoo)

# 1 - aggregate
yrmon <- as.yearmon(dat$Date, '%d-%b-%y')
aggregate(dat[1], data.frame(yrmon), length)
##      yrmon count
## 1 Mar 2007     6

# 2 - table
as.data.frame(table(yrmon))  # yrmon defined above
##      yrmon Freq
## 1 Mar 2007    6

# 3 - xtabs
as.data.frame(xtabs(~ yrmon))  # yrmon defined above
##      yrmon Freq
## 1 Mar 2007    6

# 4 - dplyr
library(dplyr)
dat %>%
  mutate(yrmon = as.yearmon(Date, '%d-%b-%y')) %>%
  count(yrmon)
##      yrmon n
## 1 Mar 2007 6

The first 3 can also be expressed as a pipeline. The last one already is a pipeline.

# 1a - as a pipeline
dat |>
  transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
  aggregate(Date ~ yrmon, length)

# 2a - table
dat |>
  transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
  with(table(yrmon)) |>
  as.data.frame()

# 3a - xtabs
dat |>
  transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
  xtabs(~ yrmon, data = _) |>
  as.data.frame()

Note

Lines <- "Date,Region
16-Mar-07,Kemerovo
19-Mar-07,Moscow City
20-Mar-07,Saratov
21-Mar-07,Novosibirsk
21-Mar-07,Omsk
22-Mar-07,Tyva
16-Mar-08,Kemerovo
19-Mar-08,Moscow City
20-Mar-08,Saratov
21-Mar-08,Novosibirsk
21-Mar-08,Omsk
22-Mar-08,Tyva"

dat2 <- read.csv(text = Lines)
dat2 |>
  transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
  with(table(yrmon)) |>
  as.data.frame()
##      yrmon Freq
## 1 Mar 2007    6
## 2 Mar 2008    6
  • Related