I have a dataframe with numerical values in one row. Now I want to calculate the cumsum of those rows, until >= 1. If this point is reached -> print for all those rows a counter, write in every row the cumsum for its counter, then look for the cumsum of the next rows.
Should look somewhow like this:
value counter cumsum
0.3 1 0.9
0.3 1 0.9
0.3 1 0.9
0.3 2 0.4
0.1 2 0.4
2 3 2
My problem is how to tell R to stop the cumsum, if >= than 1. Any ideas? Thank you in advance.
CodePudding user response:
I don't know if I understood your problem correctly, but maybe this one here helps:
value = round(runif(20, min = 0.1, max = 0.5), 1)
csumVec = numeric(length(value))
counterVec = numeric(length(value))
startIndex = 1
csum = 0
counter = 1
for(i in 1:length(value)) {
csum = csum value[i]
if(csum > 1) {
counterVec[startIndex:i] = counter
csumVec[startIndex:i] = csum-value[i]
startIndex = i
counter = counter 1
csum = value[i]
}
if(i == length(value)) {
counterVec[startIndex:i] = counter
csumVec[startIndex:i] = csum
}
}
cbind(value, counterVec, csumVec)
CodePudding user response:
It seems like you can calculate the cumulative sum, divide by 1, and take the floor()
(round down)
floor(cumsum(value) / 1)
## [1] 0 0 0 1 1 3
This is correct, except that it is 0-based and the counter
does not increment by 1. Fix these by matching the result above with their unique values
counter0 = floor(cumsum(value) / 1)
counter = match(counter0, unique(counter0))
counter
## [1] 1 1 1 2 2 3
Having got the 'tricky' part, I'd use dplyr (library(dplyr)
) for the rest
## library(dplyr)
tibble(value, counter) |>
mutate(cum_sum = cumsum(value)) |>
group_by(counter) |>
mutate(cumsum = max(cumsum(value)))
## # A tibble: 6 × 3
## # Groups: counter [3]
## value counter cumsum
## <dbl> <int> <dbl>
## 1 0.3 1 0.9
## 2 0.3 1 0.9
## 3 0.3 1 0.9
## 4 0.3 2 0.4
## 5 0.1 2 0.4
## 6 2 3 2
or perhaps capturing the tricky part in a (more general) function
cumgroup <- function(x, upper = 1) {
counter0 = floor(cumsum(x) / upper)
match(counter0, unique(counter0))
}
and incorporating into the dplyr solution
tibble(value) |>
mutate(counter = cumgroup(value)) |>
group_by(counter) |>
mutate(cumsum = max(cumsum(value)))
or depending on what precisely you want
tibble(value) |>
mutate(
cumsum = cumsum(value) %% 1,
counter = cumgroup(value)
) |>
group_by(counter) |>
mutate(cumsum = max(cumsum)) |>
select(value, counter, cumsum)