Home > Enterprise >  R - how to filter data frame by year
R - how to filter data frame by year

Time:07-29

how do I filter a R data frame by year?

In the reproducible example I am trying to filter dates that are in 2021 (in column b). Thank you!

library(tidyverse)

a <- c(10,20,30)
b <- as.Date(c('31-11-20', '15-11-21', '31-11-22'))
my_df <- data.frame(a,b)

I have tried the following code but none of them successfully filtered by the year 2021.

my_df_new <- my_df %>%
  filter(between(b, as.Date('01-01-21'), as.Date('31-12-21')))

my_df_new <- my_df %>%
  filter(between(b, as.Date('2021-01-01'), as.Date('2021-12-31')))

my_df_new <- my_df[my_df$b > "31-12-20" & my_df$b < "1-01-22", ]

CodePudding user response:

Your example dates require some extra work because (a) they are not real dates (November only has 30, not 31 days) and (b) you don't format them prior to turning them into dates.

library(dplyr)

# Example data
a <- c(10,20,30)
b <- as.Date(c('31-11-20', '15-11-21', '31-11-22'))
my_df <- data.frame(a,b)

# Extracts whatever part of the string you specified as year
# when you converted the variable to a date
my_df %>% 
  mutate(year = format(b, "%Y"))
#>    a          b year
#> 1 10 0031-11-20 0031
#> 2 20 0015-11-21 0015
#> 3 30 0031-11-22 0031

# Notice that year is not 20, 21, 22...it's actually stored as 
# the day because you didn't specify properly when creating your
# date variable. So, we'll extract day and save it as year.
new_df <- my_df %>% 
  mutate(year = format(b, "%d"))  %>%
  print()
#>    a          b year
#> 1 10 0031-11-20   20
#> 2 20 0015-11-21   21
#> 3 30 0031-11-22   22

# Now filter to only 2021
new_df %>%
  filter(year == 21)
#>    a          b year
#> 1 20 0015-11-21   21
  •  Tags:  
  • r
  • Related