I'd like to conditionally remove row from data frame using dates and means. In my example:
# Package
library(tidyverse)
# Open dataset
RES_all_files_better <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/RES_all_files_better_df.csv")
str(RES_all_files_better)
# 'data.frame': 507 obs. of 11 variables:
# $ STAND : chr "ARROIOXAVIER024B" "ARROIOXAVIER024B" "ARROIOXAVIER024B" "ARROIOXAVIER024B" ...
# $ ESPACAMENT: int 6 6 6 6 6 6 6 6 6 6 ...
# $ ESPECIE : chr "benthamii" "benthamii" "benthamii" "benthamii" ...
# $ IDADE : int 6 6 6 6 6 6 6 7 7 7 ...
# $ DATE_S2 : chr "2019-01-28" "2019-02-22" "2019-03-24" "2019-05-18" ...
# $ NDVI_avg : num 0.877 0.895 0.879 0.912 0.908 ...
# $ NDVI_sd : num 0.0916 0.0808 0.0758 0.1175 0.1132 ...
# $ NDVI_min : num -0.235 -0.1783 0.0844 -0.5666 -0.6093 ...
# $ NDVI_max : num 0.985 0.998 0.993 0.999 0.999 ...
# $ MONTH : int 1 2 3 5 7 8 9 11 12 12 ...
# $ NDVI_ref : num 0.823 0.823 0.823 0.823 0.823 ...
In my case, I search some operation for remove rows in data set, if NDVI_max NDVI_min/2
is lower than NDVI_avg
grouped by (ESPACAMENT,ESPECIE,IDADE)
in the date (DATE_S2
) before the actual date. An example for RES_all_files_better$STAND=="QUEBRACANGA012F"
:
# Original dataset:
STAND DATE_S2 NDVI_avg NDVI_min NDVI_max
...
208 QUEBRACANGA012F 2021-08-30 0.8748818 0.8238573 0.9072955
209 QUEBRACANGA012F 2021-11-08 0.5707210 0.2847520 0.8908801
210 QUEBRACANGA012F 2021-11-13 0.5515253 0.2275358 0.8940712
211 QUEBRACANGA012F 2021-12-28 0.5956103 0.2469136 0.9122636
212 QUEBRACANGA012F 2022-01-12 0.5952482 0.2084076 0.9031508
213 QUEBRACANGA012F 2022-01-22 0.5773518 0.2088580 0.8783236
214 QUEBRACANGA012F 2022-02-16 0.4246735 0.1674446 0.6224726
215 QUEBRACANGA012F 2022-02-26 0.4064463 0.1378491 0.6111995
#Final dataset:
STAND DATE_S2 NDVI_avg NDVI_min NDVI_max
...
208 QUEBRACANGA012F 2021-08-30 0.8748818 0.8238573 0.9072955
The lines 209 to 215 were removed because (NDVI_max NDVI_min/2)=0.5878161
that is lower than NDVI_avg = 0.8748818
in last date 2021-08-30
.
Please, any help with it?
CodePudding user response:
We may need to filter
on the min
computed value ('new')
library(dplyr)
RES_all_files_better %>%
# convert to `Date` class and create a sequence column for checking
mutate(rn = row_number(), DATE_S2 = as.Date(DATE_S2)) %>%
# grouped by columns
group_by(ESPACAMENT,ESPECIE,IDADE) %>%
# create computed column
mutate(New = (NDVI_max NDVI_min/2)) %>%
# filter the rows where the NDVI_avg is greater than the minimum value
filter(NDVI_avg > min(New)) %>%
ungroup #%>%
# select(-rn, -New)