Calculate the NaN average number between numeric values in a dataframe column in r-CodePudding

This is how my data looks like:

 dput(head(dataf,40))
structure(list(Time = c(20000224, 20000225, 20000226, 20000227, 
20000228, 20000229, 20000301, 20000302, 20000303, 20000304, 20000305, 
20000306, 20000307, 20000308, 20000309, 20000310, 20000311, 20000312, 
20000313, 20000314, 20000315, 20000316, 20000317, 20000318, 20000319, 
20000320, 20000321, 20000322, 20000323, 20000324, 20000325, 20000326, 
20000327, 20000328, 20000329, 20000330, 20000331, 20000401, 20000402, 
20000403), NDVI = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, 0.562256741, 0.560792828, 0.559272321, 0.554727235, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.506596306, 0.504768168, 
0.536223787, 0.5476394, 0.520311992, 0.52862363, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN, NaN, NaN)), row.names = c(NA, -40L), class = c("tbl_df", 
"tbl", "data.frame"))

I want to know the mean value of NaN between two numerical values, in a column (NDVI) of a data frame.

See the example bellow:

Date	NDVI
20000224	NaN
20000225	NaN
20000226	NaN
20000227	NaN
20000228	NaN
20000229	NaN
20000301	NaN
20000302	NaN
20000303	NaN
20000304	NaN
20000305	NaN
20000306	NaN
20000307	NaN
20000308	0.562256741
20000309	0.560792828
20000310	0.559272321
20000311	0.554727235
20000312	NaN
20000313	NaN
20000314	NaN
20000315	NaN
20000316	NaN
20000317	NaN
20000318	NaN
20000319	0.506596306
20000320	0.504768168
20000321	0.536223787
20000322	0.5476394
20000323	0.520311992
20000323	0.52862363

The mean value would be:

including the initial NaN's: 10 NaN

not including the 1st NaN's: 7 NaN

Note, I've put this data frame for purpose because it starts with NaN. The solution I pretend is either include the NaN at the beginning, so it will be accounted for the mean of NaN between values and one other solution that only starts to count the NaN after the first numeric value before the NaN.

Any help will be much appreciated.

CodePudding user response：

library(dplyr)

dataf %>% 
  group_by(cumsum(!is.na(NDVI))) %>% 
  mutate(cnt = cumsum(is.na(NDVI))) %>% 
  filter(cnt != 0) %>% 
    summarise(consecutive_NaNs = max(cnt)) %>% 
    select(consecutive_NaNs)

Returns:

  consecutive_NaNs
             <int>
1               13
2                7
3               10

If you only want the mean:

dataf %>% 
group_by(cumsum(!is.na(NDVI))) %>% 
mutate(cnt = cumsum(is.na(NDVI))) %>% 
filter(cnt != 0) %>% 
  summarise(consecutive_NaNs = max(cnt)) %>% 
  select(consecutive_NaNs) %>% 
    summarise(mean_run_length = mean(consecutive_NaNs))

Returns:

   mean_run_length
            <dbl>
1              10

P.S.

If you use the solution that gives all run lengths you can decide if you want to include the initial run or not....