Home > Software engineering >  Subset a data frame based on year column with the closest value below and farthest value above anoth
Subset a data frame based on year column with the closest value below and farthest value above anoth

Time:12-16

Subset dataframe based on PipeYear with the closest value below PropertyYearBuilt and farthest value above the PropertyYearBuilt using the following R code:

df <- read.table(text="
PipeID  PricePipe   PipeYear PropertyYearBuilt  Distance_to_property
    a       500         2010     2013               1.5
    b       600         2007     2008               2.5
    c       700         2009     2008               3.0
    d       800         1998     2000               4.2
    e       900         2003     2000               4.5
    f       200         2014     2013               5.0
    g       100         2011     2013               5.5
    h       850         2018     2008               7.0", header = TRUE)

Thanks!

CodePudding user response:

Similar answer to one I posted here (if you have latest dplyr updates), but this time the furthest above is just the max grouped by PropertyID:

library(tidyverse)

df <- read.table(text="
PipeID  PricePipe   PipeYear PropertyYearBuilt  Distance_to_property
    a       500         2010     2013               1.5
    b       600         2007     2008               2.5
    c       700         2009     2008               3.0
    d       800         1998     2000               4.2
    e       900         2003     2000               4.5
    f       200         2014     2013               5.0
    g       100         2011     2013               5.5
    h       850         2018     2008               7.0", header = TRUE) |> 
  mutate(PropertyID = as.numeric(as.factor(PropertyYearBuilt)))

bind_rows(
  df |>
    select(PropertyYearBuilt, PropertyID) |>
    unique() |>
    left_join(
      df |> select(-PropertyYearBuilt),
      join_by(PropertyID, closest(PropertyYearBuilt >= PipeYear))
    ),
df |> 
  group_by(PropertyYearBuilt) |> 
  filter(PipeYear == max(PipeYear))
) |> 
  arrange(PropertyID, PipeYear)
#>   PropertyYearBuilt PropertyID PipeID PricePipe PipeYear Distance_to_property
#> 1              2000          1      d       800     1998                  4.2
#> 2              2000          1      e       900     2003                  4.5
#> 3              2008          2      b       600     2007                  2.5
#> 4              2008          2      h       850     2018                  7.0
#> 5              2013          3      g       100     2011                  5.5
#> 6              2013          3      f       200     2014                  5.0
  • Related