value1 | value2 |
---|---|
1 | 1 |
0 | 1 |
0 | 1 |
0 | 1 |
2 | 2 |
0 | 2 |
0 | 2 |
3 | 3 |
4 | 4 |
0 | 4 |
I have a column "value1" in a data frame and I want to repeat the same value until the next number > 0. What i want as a result is the column "value2". It is a big amount of data so a loop code would be great.
Thanks to the community!!
CodePudding user response:
We could use cumsum
with the condition:
library(dplyr)
df %>%
mutate(value2 = cumsum(value1>0))
value1 value2
1 1 1
2 0 1
3 0 1
4 0 1
5 2 2
6 0 2
7 0 2
8 3 3
9 4 4
10 0 4
data:
df <- structure(list(value1 = c(1L, 0L, 0L, 0L, 2L, 0L, 0L, 3L, 4L,
0L)), class = "data.frame", row.names = c(NA, -10L))
CodePudding user response:
For these cases I typically replace the 0s with NA
values, and use tidyr::fill()
to copy the last non-missing (i.e. non-zero) value forward.
Here is an example:
df <- data.frame(
value1 = c(1, 0, 0, 0, 2, 0, 0, 3, 4, 0)
)
library(dplyr)
df %>%
mutate(
value2 = ifelse(value1 == 0, NA_real_, value1)
) %>%
tidyr::fill(value2, .direction = "down")
and the result:
value1 value2
1 1 1
2 0 1
3 0 1
4 0 1
5 2 2
6 0 2
7 0 2
8 3 3
9 4 4
10 0 4
This works even when the values are increasing by more/less than 1, which is not the case with e.g. cumsum()
.
CodePudding user response:
There may be more elegant ways to do this but assuming that the column "value2" is already in the dataframe you can do something like the following. This answer relies solely on base R and also it does not matter if v1 is increasing or decreasing, just that it is non-zero. I'll create a data frame as an example.
set.seed(65L)
df <- data.frame(v1 = sample(0:4, 1000, replace = TRUE), v2 = 0)
head(df, 12)
v1 v2
1 2 0
2 1 0
3 3 0
4 0 0
5 0 0
6 4 0
7 2 0
8 4 0
9 4 0
10 0 0
11 4 0
12 2 0
# Handle the first row seperately to get rid of i - 1 headaches
if (df$v1[1L] > 0) {df$v2[1L] <- 1}
# Now the loop. Safer to do seq_len(length(df$v1) - 1) 1 but that's more confusing
for (i in 2:length(df$v1)) {
df$v2[i] <- df$v2[i - 1] if (df$v1[i] > 0) {1} else {0}
}
head(df, 12)
v1 v2
1 2 1
2 1 2
3 3 3
4 0 3
5 0 3
6 4 4
7 2 5
8 4 6
9 4 7
10 0 7
11 4 8
12 2 9
CodePudding user response:
No need for using a loop. One option using base
R would be:
df <- data.frame(value1 = c(1,0,0,0,2,0,0,3,4,0))
df$value2 <- cumsum(ifelse(df$value1 > 0, 1, 0))
Which yields:
> df
value1 value2
1 1 1
2 0 1
3 0 1
4 0 1
5 2 2
6 0 2
7 0 2
8 3 3
9 4 4
10 0 4