I am new in R so maybe this question its so simple, but I tried many approaches and none works for me. I have a dataframe with columns consisting in two kind of values: 0 and non-zero. Zero represents normal conditions while other values represents dry/wet periods. I need an approach that, beginning in the first, sum and count all cells with values different to 0, stop when reach a cell with 0 and then continue until the end of the column. The idea is obtain the magnitude (sum) and longitude (count) of each dry/wet period.
I tried used this:
#Load data
library(readxl)
data<-read_xlsx("R_amplitud.xlsx")
a<-c () #Empty variable to save results
> print(data)
## A tibble: 8,440 x 2
Date `spei1_-1`
<dttm> <dbl>
1 1999-01-09 00:00:00 0
2 1999-01-16 00:00:00 0
3 1999-01-23 00:00:00 0
4 1999-02-01 00:00:00 0
5 1999-02-09 00:00:00 0
6 1999-02-16 00:00:00 -1.26
7 1999-02-23 00:00:00 0
8 1999-03-01 00:00:00 0
9 1999-03-09 00:00:00 0
10 1999-03-16 00:00:00 0
11 1999-03-23 00:00:00 0
12 1999-04-01 00:00:00 0
13 1999-04-09 00:00:00 -1.23
14 1999-04-16 00:00:00 -1.29
15 1999-04-23 00:00:00 -1.99
# ... with 8,425 more rows
#Loop 1(I tried both)
for (i in 1:8440){if (data$`spei1_-1`!=0){a<-sum ()} else {next}}```
#Loop 2
for (value in data$`spei1_-1`){
if(value!=0) {a = sum(data$`spei1_-1`)
}else{
next
}
}
Using ifelse() comand:
a<-ifelse((data$`spei1_-1`!=0),sum (),"")
View(a)
For the 2 first loops I received the following warning:
> for (i in 1:8440){if (data$`spei1_-1`!=0){a<-sum ()} else {next}}
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In if (data$`spei1_-1` != 0) { ... :
la condición tiene longitud > 1 y sólo el primer elemento será usado
And for the ifelse () command I obtain the sum of anything (all cells with zero now are empty and cells with values now are 0).
> a<-ifelse((data$`spei1_-1`!=0),sum (),"")
> print(a)
[1] "" "" "" "" "" "0" "" "" "" "" "" "" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
[25] "0" "0" "" "" "" "" "0" "" "" "" "" "" "" "" "0" "0" "0" "" "" "" "" "" "" ""
If I change the sum argument by the data with are not zero I obtain the sum of all cells:
a<-ifelse((data$`spei1_-1`!=0),sum (data$`spei1_-1`!=0),"")
> print(a)
[1] "" "" "" "" "" "3344" "" "" "" "" "" "" "3344"
[14] "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344" "3344"
My expected results could be something like this where the two last column are the count and sums of cells with non-zero value for the Spei1_-1 variable
A tibble: 8,440 x 4
Date `spei1_-1` `COUNT_spei1_-1` `SUM_spei1_-1`
<dttm> <dbl> <dbl> <dbl>
1 1999-01-09 00:00:00 0 NA NA
2 1999-01-16 00:00:00 0 NA NA
3 1999-01-23 00:00:00 0 NA NA
4 1999-02-01 00:00:00 0 NA NA
5 1999-02-09 00:00:00 0 NA NA
6 1999-02-16 00:00:00 -1.26 1 -1.26
7 1999-02-23 00:00:00 0 NA NA
8 1999-03-01 00:00:00 0 NA NA
9 1999-03-09 00:00:00 0 NA NA
10 1999-03-16 00:00:00 0 NA NA
11 1999-03-23 00:00:00 0 NA NA
12 1999-04-01 00:00:00 0 NA NA
13 1999-04-09 00:00:00 -1.23 NA NA
14 1999-04-16 00:00:00 -1.29 NA NA
15 1999-04-23 00:00:00 -1.99 NA NA
16 1999-05-01 00:00:00 -1.36 NA NA
17 1999-05-09 00:00:00 -1.31 NA NA
18 1999-05-16 00:00:00 -1.18 NA NA
19 1999-05-23 00:00:00 -1.44 NA NA
20 1999-06-01 00:00:00 -1.65 NA NA
21 1999-06-09 00:00:00 -1.15 NA NA
22 1999-06-16 00:00:00 -1.18 NA NA
23 1999-06-23 00:00:00 -1.11 NA NA
24 1999-07-01 00:00:00 -1.2 NA NA
25 1999-07-09 00:00:00 -1.44 NA NA
26 1999-07-16 00:00:00 -1.3 14 -18.8
27 1999-07-23 00:00:00 0 NA NA
28 1999-08-01 00:00:00 0 NA NA
29 1999-08-09 00:00:00 0 NA NA
30 1999-08-16 00:00:00 0 NA NA
31 1999-08-23 00:00:00 -1.65 1 -1.65
32 1999-09-01 00:00:00 0 NA NA
33 1999-09-09 00:00:00 0 NA NA
34 1999-09-16 00:00:00 0 NA NA
35 1999-09-23 00:00:00 0 NA NA
36 1999-10-01 00:00:00 0 NA NA
37 1999-10-09 00:00:00 0 NA NA
38 1999-10-16 00:00:00 0 NA NA
39 1999-10-23 00:00:00 1.38 NA NA
40 1999-11-01 00:00:00 1 NA NA
41 1999-11-09 00:00:00 1.26 3 3.64
42 1999-11-16 00:00:00 0 NA NA
43 1999-11-23 00:00:00 0 NA NA
44 1999-12-01 00:00:00 0 NA NA
45 1999-12-09 00:00:00 0 NA NA
46 1999-12-16 00:00:00 0 NA NA
47 1999-12-23 00:00:00 0 NA NA
48 2000-01-01 00:00:00 0 NA NA
49 2000-01-09 00:00:00 0 NA NA
50 2000-01-16 00:00:00 0 NA NA
# ... with 8,390 more rows
Any suggestion about how do it?
CodePudding user response:
The following code will produce the two columns that you want for each vector. It uses cumsum(x)==0
to identify the runs of non-zero x
for the ave
function to group over. Then tests within each group whether you have the last observation (and a non-zero group), and adds the required information.
The cbind
is just to organise it here so you can see what is happening.
x=c(0,0,0,-1,-2,0,0,-0.5,0,0,-1)
cbind(
x,
SUM=ave( x=x, by=cumsum(x==0), FUN = function(x) ifelse(seq_along(x)==length(x) & x!=0,cumsum(x),NA)),
COUNT=ave( x=x, by=cumsum(x==0), FUN = function(x) ifelse(seq_along(x)==length(x) & x!=0,length(x)-1,NA))
)
x SUM COUNT
[1,] 0.0 NA NA
[2,] 0.0 NA NA
[3,] 0.0 NA NA
[4,] -1.0 NA NA
[5,] -2.0 -3.0 2
[6,] 0.0 NA NA
[7,] 0.0 NA NA
[8,] -0.5 -0.5 1
[9,] 0.0 NA NA
[10,] 0.0 NA NA
[11,] -1.0 -1.0 1