Subset dataframe based on PipeYear with the closest value below PropertyYearBuilt and farthest value above the PropertyYearBuilt using the following R code:
df <- read.table(text="
PipeID PricePipe PipeYear PropertyYearBuilt Distance_to_property
a 500 2010 2013 1.5
b 600 2007 2008 2.5
c 700 2009 2008 3.0
d 800 1998 2000 4.2
e 900 2003 2000 4.5
f 200 2014 2013 5.0
g 100 2011 2013 5.5
h 850 2018 2008 7.0", header = TRUE)
Thanks!
CodePudding user response:
Similar answer to one I posted here (if you have latest dplyr
updates), but this time the furthest above is just the max
grouped by PropertyID
:
library(tidyverse)
df <- read.table(text="
PipeID PricePipe PipeYear PropertyYearBuilt Distance_to_property
a 500 2010 2013 1.5
b 600 2007 2008 2.5
c 700 2009 2008 3.0
d 800 1998 2000 4.2
e 900 2003 2000 4.5
f 200 2014 2013 5.0
g 100 2011 2013 5.5
h 850 2018 2008 7.0", header = TRUE) |>
mutate(PropertyID = as.numeric(as.factor(PropertyYearBuilt)))
bind_rows(
df |>
select(PropertyYearBuilt, PropertyID) |>
unique() |>
left_join(
df |> select(-PropertyYearBuilt),
join_by(PropertyID, closest(PropertyYearBuilt >= PipeYear))
),
df |>
group_by(PropertyYearBuilt) |>
filter(PipeYear == max(PipeYear))
) |>
arrange(PropertyID, PipeYear)
#> PropertyYearBuilt PropertyID PipeID PricePipe PipeYear Distance_to_property
#> 1 2000 1 d 800 1998 4.2
#> 2 2000 1 e 900 2003 4.5
#> 3 2008 2 b 600 2007 2.5
#> 4 2008 2 h 850 2018 7.0
#> 5 2013 3 g 100 2011 5.5
#> 6 2013 3 f 200 2014 5.0