I want to classify the rows of a data frame based on a threshold applied to a given numeric reference column. If the reference column has a value below the threshold, then the result is 0, which I want to add to a new column. If the reference column value is over the threshold, then the new column will have value 1 in all consecutive rows with value over the threshold until a new 0 result comes up. If a new reference value is over the threshold then the value to add is 2, and so on.
If we set up the threshold > 2 then an example of what I would like to obtain is:
row | reference | result |
---|---|---|
1 | 2 | 0 |
2 | 1 | 0 |
3 | 4 | 1 |
4 | 3 | 1 |
5 | 1 | 0 |
6 | 6 | 2 |
7 | 8 | 2 |
8 | 4 | 2 |
9 | 1 | 0 |
10 | 3 | 3 |
11 | 6 | 3 |
row <- c(1:11)
reference <- c(2,1,4,3,1,6,8,4,1,3,6)
result <- c(0,0,1,1,0,2,2,2,0,3,3)
table <- cbind(row, reference, result)
Thank you!
CodePudding user response:
We can use run-length encoding (rle) for this.
The below assumes a data.frame
:
r <- rle(quux$reference <= 2)
r$values <- ifelse(r$values, 0, cumsum(r$values))
quux$result2 <- inverse.rle(r)
quux
# row reference result result2
# 1 1 2 0 0
# 2 2 1 0 0
# 3 3 4 1 1
# 4 4 3 1 1
# 5 5 1 0 0
# 6 6 6 2 2
# 7 7 8 2 2
# 8 8 4 2 2
# 9 9 1 0 0
# 10 10 3 3 3
# 11 11 6 3 3
Data
quux <- structure(list(row = 1:11, reference = c(2, 1, 4, 3, 1, 6, 8, 4, 1, 3, 6), result = c(0, 0, 1, 1, 0, 2, 2, 2, 0, 3, 3)), row.names = c(NA, -11L), class = "data.frame")
CodePudding user response:
As noted in the comments by @Sotos, would consider alternative name for your object.
Since it wasn't clear if data.frame or matrix, assume we have a data.frame df
based on your data:
df <- as.data.frame(table)
And have a threshold of 2:
threshold = 2
You can adapt this solution by @flodel:
df$new_result = ifelse(
x <- reference > threshold,
cumsum(c(x[1], diff(x) == 1)),
0)
df
In this case, the diff(x)
will include a vector, where values of 1 indicate where result should be increased by cumsum
(in the sample data, this occurs in rows 3, 6, and 10). These are transitions from FALSE to TRUE (0 to 1), where reference
goes from below to above threshold
. Note that x[1]
is added/combined since the diff
values will be 1 element shorter in length.
Using the ifelse
, these new incremental values only apply to those where reference
exceeds threshold
, otherwise set at 0.
Output
row reference result new_result
1 1 2 0 0
2 2 1 0 0
3 3 4 1 1
4 4 3 1 1
5 5 1 0 0
6 6 6 2 2
7 7 8 2 2
8 8 4 2 2
9 9 1 0 0
10 10 3 3 3
11 11 6 3 3