Home > OS >  Subset a dataframe based on reciprocity conditions
Subset a dataframe based on reciprocity conditions

Time:04-03

I have the following data frame structure

date latitude
1951-03-22 66.08106
1951-03-22 59.59117
1951-04-08 59.59117
1952-10-20 55.41972
1960-08-12 66.05653
1960-09-10 66.08106

What I would like to do is: select the rows for all unique latitude and if there are 2 (or more) exact same latitudes, I want to keep only the one that has the earliest date but for each year.

So, for my previous example, it would give the following subset, without only the 3rd row:

date latitude
1951-03-22 66.08106
1951-03-22 59.59117
1952-10-20 55.41972
1960-08-12 66.05653
1960-09-10 66.08106

Many thanks for the help.

PS: maybe it is important to precise that class(df$date) is "Date" and class(df$latitude) is "numeric".

CodePudding user response:

Grouped by 'latitude' and year extracted from 'date', use slice_max to extract the row with max date, and then remove the 'year' column

library(dplyr)
library(lubridate)
df1 %>%    
    # grouped by latitude, and year extracted from Date class 
    # year is from lubridate
    group_by(latitude, year = year(date)) %>% 
    # slice 1 row from each group, ordered by the 'date' column
    slice_max(n = 1, order_by = date) %>%
    # remove the grouping
    ungroup %>%
    # remove the year column
    select(-year) %>%
    arrange(date)

-output

# A tibble: 5 × 2
  date       latitude
  <chr>         <dbl>
1 1951-03-22     66.1
2 1951-04-08     59.6
3 1952-10-20     55.4
4 1960-08-12     66.1
5 1960-09-10     66.1
  • Related