Home > OS >  Conditionally remove row from data frame using dates and means
Conditionally remove row from data frame using dates and means

Time:03-03

I'd like to conditionally remove row from data frame using dates and means. In my example:

# Package
library(tidyverse)

# Open dataset
RES_all_files_better <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/RES_all_files_better_df.csv")
str(RES_all_files_better)
# 'data.frame': 507 obs. of  11 variables:
#  $ STAND     : chr  "ARROIOXAVIER024B" "ARROIOXAVIER024B" "ARROIOXAVIER024B" "ARROIOXAVIER024B" ...
#  $ ESPACAMENT: int  6 6 6 6 6 6 6 6 6 6 ...
#  $ ESPECIE   : chr  "benthamii" "benthamii" "benthamii" "benthamii" ...
#  $ IDADE     : int  6 6 6 6 6 6 6 7 7 7 ...
#  $ DATE_S2   : chr  "2019-01-28" "2019-02-22" "2019-03-24" "2019-05-18" ...
#  $ NDVI_avg  : num  0.877 0.895 0.879 0.912 0.908 ...
#  $ NDVI_sd   : num  0.0916 0.0808 0.0758 0.1175 0.1132 ...
#  $ NDVI_min  : num  -0.235 -0.1783 0.0844 -0.5666 -0.6093 ...
#  $ NDVI_max  : num  0.985 0.998 0.993 0.999 0.999 ...
#  $ MONTH     : int  1 2 3 5 7 8 9 11 12 12 ...
#  $ NDVI_ref  : num  0.823 0.823 0.823 0.823 0.823 ...

In my case, I search some operation for remove rows in data set, if NDVI_max NDVI_min/2 is lower than NDVI_avg grouped by (ESPACAMENT,ESPECIE,IDADE) in the date (DATE_S2) before the actual date. An example for RES_all_files_better$STAND=="QUEBRACANGA012F":

# Original dataset:
              STAND    DATE_S2  NDVI_avg  NDVI_min  NDVI_max
...
208 QUEBRACANGA012F 2021-08-30 0.8748818 0.8238573 0.9072955
209 QUEBRACANGA012F 2021-11-08 0.5707210 0.2847520 0.8908801
210 QUEBRACANGA012F 2021-11-13 0.5515253 0.2275358 0.8940712
211 QUEBRACANGA012F 2021-12-28 0.5956103 0.2469136 0.9122636
212 QUEBRACANGA012F 2022-01-12 0.5952482 0.2084076 0.9031508
213 QUEBRACANGA012F 2022-01-22 0.5773518 0.2088580 0.8783236
214 QUEBRACANGA012F 2022-02-16 0.4246735 0.1674446 0.6224726
215 QUEBRACANGA012F 2022-02-26 0.4064463 0.1378491 0.6111995

#Final dataset:
              STAND    DATE_S2  NDVI_avg  NDVI_min  NDVI_max
...
208 QUEBRACANGA012F 2021-08-30 0.8748818 0.8238573 0.9072955

The lines 209 to 215 were removed because (NDVI_max NDVI_min/2)=0.5878161 that is lower than NDVI_avg = 0.8748818 in last date 2021-08-30.

Please, any help with it?

CodePudding user response:

We may need to filter on the min computed value ('new')

library(dplyr)
RES_all_files_better %>% 
  # convert to `Date` class and create a sequence column for checking
  mutate(rn = row_number(), DATE_S2 = as.Date(DATE_S2)) %>% 
  # grouped by columns
  group_by(ESPACAMENT,ESPECIE,IDADE) %>%
  # create computed column
  mutate(New = (NDVI_max NDVI_min/2)) %>% 
  # filter the rows where the NDVI_avg is greater than the minimum value
  filter(NDVI_avg > min(New)) %>% 
  ungroup #%>%
  # select(-rn, -New) 
  • Related