This is how my data looks like:
dput(head(dataf,40))
structure(list(Time = c(20000224, 20000225, 20000226, 20000227,
20000228, 20000229, 20000301, 20000302, 20000303, 20000304, 20000305,
20000306, 20000307, 20000308, 20000309, 20000310, 20000311, 20000312,
20000313, 20000314, 20000315, 20000316, 20000317, 20000318, 20000319,
20000320, 20000321, 20000322, 20000323, 20000324, 20000325, 20000326,
20000327, 20000328, 20000329, 20000330, 20000331, 20000401, 20000402,
20000403), NDVI = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, 0.562256741, 0.560792828, 0.559272321, 0.554727235,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.506596306, 0.504768168,
0.536223787, 0.5476394, 0.520311992, 0.52862363, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame"))
I want to know the mean value of NaN
between two numerical values, in a column (NDVI) of a data frame.
See the example bellow:
Date | NDVI |
---|---|
20000224 | NaN |
20000225 | NaN |
20000226 | NaN |
20000227 | NaN |
20000228 | NaN |
20000229 | NaN |
20000301 | NaN |
20000302 | NaN |
20000303 | NaN |
20000304 | NaN |
20000305 | NaN |
20000306 | NaN |
20000307 | NaN |
20000308 | 0.562256741 |
20000309 | 0.560792828 |
20000310 | 0.559272321 |
20000311 | 0.554727235 |
20000312 | NaN |
20000313 | NaN |
20000314 | NaN |
20000315 | NaN |
20000316 | NaN |
20000317 | NaN |
20000318 | NaN |
20000319 | 0.506596306 |
20000320 | 0.504768168 |
20000321 | 0.536223787 |
20000322 | 0.5476394 |
20000323 | 0.520311992 |
20000323 | 0.52862363 |
The mean value would be:
including the initial NaN's: 10 NaN
not including the 1st NaN's: 7 NaN
Note, I've put this data frame for purpose because it starts with NaN
. The solution I pretend is either include the NaN
at the beginning, so it will be accounted for the mean of NaN
between values and one other solution that only starts to count the NaN after the first numeric value before the NaN
.
Any help will be much appreciated.
CodePudding user response:
library(dplyr)
dataf %>%
group_by(cumsum(!is.na(NDVI))) %>%
mutate(cnt = cumsum(is.na(NDVI))) %>%
filter(cnt != 0) %>%
summarise(consecutive_NaNs = max(cnt)) %>%
select(consecutive_NaNs)
Returns:
consecutive_NaNs
<int>
1 13
2 7
3 10
If you only want the mean:
dataf %>%
group_by(cumsum(!is.na(NDVI))) %>%
mutate(cnt = cumsum(is.na(NDVI))) %>%
filter(cnt != 0) %>%
summarise(consecutive_NaNs = max(cnt)) %>%
select(consecutive_NaNs) %>%
summarise(mean_run_length = mean(consecutive_NaNs))
Returns:
mean_run_length
<dbl>
1 10
P.S.
If you use the solution that gives all run lengths you can decide if you want to include the initial run or not....