I want to create a new column that determines if a value is below or above 1.5 and then classifies it as 1 or 2. Classifying it as 1 if the value is below 1.5 and 2 if it is above 1.5. Then want to take that output and calculate the number that are correct based on values in a different column.
data <- structure(list(Spike_Numeric = c(2, 2, 2, 2, 2, 2), pred = c(1.98074853133856,
2.02043203671689, 1.77571051892715, 1.71595663747364, 1.5370482202268,
2.05764433439194)), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")
input
Spike_Numeric pred
1 2 1.980749
2 2 2.020432
3 2 1.775711
4 2 1.715957
5 2 1.537048
6 2 2.057644
output 1
Spike_Numeric pred result
1 2 1.980749 2
2 2 2.020432 2
3 2 1.775711 2
4 2 1.715957 2
5 2 1.537048 2
6 2 2.057644 2
output 2
100%
It would be 100% because all were correctly classified as 2.
There are no 1's in this example but, if its possible to create a matrix (classification table) of like correctly predicted 1s and 2s that would be great as well to get specificity and sensitivity.
CodePudding user response:
We may use ifelse
data$result <- with(data, ifelse(pred > 1.5, 2, 1))
CodePudding user response:
The ifelse
solution works well when you have a single threshold (i.e. two result values), and it’s what you should use in that situation. However, it does not generalise well.
To generalise the problem of assigning numbers to adjacent groups, we can use the R function cut
.
cut
allows you to define a vector of thresholds, and classifies the input into the appropriate buckets delimited by these thresholds. However, to use it we need to add an arbitrarily low and high bound on either extreme. -Inf
and Inf
work well for this.
cut
does not return integers but interval names (e.g. something like (-Inf,1.5]
). If we only want numbers, we can use findIntervals
instead of cut
.
With this, your code could look like this:
within(data, result <- findInterval(pred, c(-Inf, 1.5, Inf)))
OK, that’s boring: every value is 2 on your test data. But let’s say you you want to use the thresholds 1.5, 1.7 and 2.0:
within(data, result <- findInterval(pred, c(-Inf, 1.5, 1.7, 2.0, Inf)))
# Spike_Numeric pred result
# 1 2 1.980749 3
# 2 2 2.020432 4
# 3 2 1.775711 3
# 4 2 1.715957 3
# 5 2 1.537048 2
# 6 2 2.057644 4