Home > Mobile >  Using Lubridate producing NA values
Using Lubridate producing NA values

Time:12-28

I have this data frame;(df)

date               Name    Name_id      x1      x2     x3    x4    x5    x6   
01/01/2000 00:00      A       U_12       1       1      1     1     1     1
01/01/2000 01:00      A       U_12
01/01/2000 02:00
01/01/2000 03:00
....

I am trying to calculate the monthly aggregated mean etc. for some columns using lubridate.

what I did so far;

df$date <- dmy_hm(Sites_tot$date)

df$month <- floor_date(df$date,"month")

monthly_avgerage <- df %>%
  group_by(Name, Name_id, month) %>%
  summarize_at(vars(x1:x4), .funs = c("mean", "min", "max"), na.rm = TRUE)

I can see the values seem okay although some of the months are turned into NAs.

enter image description here

CodePudding user response:

We can modify the summarise_at to

library(dplyr)
df %>%
   group_by(Name, Name_id, month) %>%
   summarise(across(x1:x4,  list(mean = ~ mean(.x, na.rm = TRUE), 
                         min = ~ min(.x, na.rm = TRUE),
                         max = ~ max(.x, na.rm = TRUE))))

A reproducible example

iris %>%
   group_by(Species) %>%
   summarise(across(everything(),  list(mean = ~ mean(.x, na.rm = TRUE), 
                         min = ~ min(.x, na.rm = TRUE),
                         max = ~ max(.x, na.rm = TRUE))))

CodePudding user response:

If I am not wrong, the challenge is to get date column into datetime format:

Somehow date = dmy_hm(date) does not work:

library(dplyr)
library(lubridate)

df %>% 
  mutate(date = dmy_hms(paste0(date, ":00")),
         month = month(date)) %>% 
  group_by(Name, Name_id, month) %>%
  summarise(across(x1:x4,  list(mean = ~ mean(.x, na.rm = TRUE), 
                                min = ~ min(.x, na.rm = TRUE),
                                max = ~ max(.x, na.rm = TRUE))), .groups = "drop")

 Name  Name_id month x1_mean x1_min x1_max x2_mean x2_min x2_max x3_mean x3_min x3_max
  <chr> <chr>   <dbl>   <dbl>  <int>  <int>   <dbl>  <int>  <int>   <dbl>  <int>  <int>
1 A     U_12        1     1.5      1      2     1.5      1      2     1.5      1      2
2 B     U_13        1     3.5      3      4     3.5      3      4     3.5      3      4
# … with 3 more variables: x4_mean <dbl>, x4_min <int>, x4_max <int>
# ℹ Use `colnames()` to see all variable names

fake data:

df <- structure(list(date = c("01/01/2000 00:00", "01/01/2000 01:00", 
"01/01/2000 02:00", "01/01/2000 03:00"), Name = c("A", "A", "B", 
"B"), Name_id = c("U_12", "U_12", "U_13", "U_13"), x1 = 1:4, 
    x2 = 1:4, x3 = 1:4, x4 = 1:4, x5 = 1:4, x6 = 1:4), class = "data.frame", row.names = c(NA, 
-4L))
  • Related