Home > Enterprise >  Filter a dataframe with dplyr in R
Filter a dataframe with dplyr in R

Time:08-27

Hey I want to make this code work:

data = data %>% filter(value1=="city" & !is.na(value2)) %>% group_by(A_id, B_id)  %>%
    filter(date==min(as.Date(date,"%d.%m.%Y"))) %>% select(A_id, B_id, date)

I get the following error: Caused by error in charToDate(), character string is not in a standard unambiguous format. But if I work with the date column in other lines of code it works fine... Doesnt help to change it to date format before the filtering, same error..

This is my existing dataframe for the example A_id and B_id:

A_id B_id date
1 1 01.01.2020
1 1 01.11.2021
1 2 05.12.2019
1 2 31.01.2020
2 1 05.12.2019
2 1 01.11.2021
2 3 01.11.2021
2 3 31.01.2020

I would like to receive the following table: (which date format exactly doesnt matter)

A_id B_id date
1 1 01.01.2020
1 2 05.12.2019
2 1 05.12.2019
2 3 31.01.2020

Anybody who could help?

CodePudding user response:

Use slice_min which takes the minimum value in each group:

data %>% 
  group_by(A_id, B_id) %>%
  slice_min(as.Date(date, "%d.%m.%Y"))
# A tibble: 4 × 3
# Groups:   A_id, B_id [4]
   A_id  B_id date      
  <int> <int> <chr>     
1     1     1 01.01.2020
2     1     2 05.12.2019
3     2     1 05.12.2019
4     2     3 31.01.2020

If you wanna keep filter, you should format your string on both sides:

data %>% 
  group_by(A_id, B_id) %>%
  filter(as.Date(date,"%d.%m.%Y") == min(as.Date(date,"%d.%m.%Y")))
  • Related