Home > Software design >  Count rows in a dataframe by year and by condition
Count rows in a dataframe by year and by condition

Time:06-24

I've got a large dataframe of the following structure:

Date           Weight
2018-01-03     0.05000000
2018-01-09     0.42000000
2019-01-10     0.27500000
2019-01-11     0.55000000
2020-01-04     0.25025991
2020-01-07     0.27000012

Firstly I'd like to be able to count the number of datapoints in each year so that I can then create a bar chart of it. I felt I needed to create a Year column to achieve this:

df$Year <- format(df$Date, format="%Y")

I would also like to count the number of datapoints that exist above some particular bounds, say how many datapoints in each year that are above say 0.1, 0.2, 0.5 etc.

Does anyone know how to achieve this?

CodePudding user response:

A possible solution, based on dplyr and lubridate::year:

library(dplyr)
library(lubridate)

df %>% 
  group_by(year = year(Date)) %>% 
  summarise(n = n(), n05 = sum(Weight > 0.5))

#> # A tibble: 3 × 3
#>    year     n   n05
#>   <dbl> <int> <int>
#> 1  2018     2     0
#> 2  2019     2     1
#> 3  2020     2     0
  • Related