Home > other >  Create a new column that determines if a value is below or above 1.5
Create a new column that determines if a value is below or above 1.5

Time:12-20

I want to create a new column that determines if a value is below or above 1.5 and then classifies it as 1 or 2. Classifying it as 1 if the value is below 1.5 and 2 if it is above 1.5. Then want to take that output and calculate the number that are correct based on values in a different column.

data <- structure(list(Spike_Numeric = c(2, 2, 2, 2, 2, 2), pred = c(1.98074853133856, 
                                                                     2.02043203671689, 1.77571051892715, 1.71595663747364, 1.5370482202268, 
                                                                     2.05764433439194)), row.names = c("1", "2", "3", "4", "5", "6"
                                                                     ), class = "data.frame")

input

Spike_Numeric     pred
1             2 1.980749
2             2 2.020432
3             2 1.775711
4             2 1.715957
5             2 1.537048
6             2 2.057644

output 1

Spike_Numeric     pred    result
1             2 1.980749       2
2             2 2.020432       2
3             2 1.775711       2
4             2 1.715957       2
5             2 1.537048       2
6             2 2.057644       2

output 2

100% 

It would be 100% because all were correctly classified as 2.

There are no 1's in this example but, if its possible to create a matrix (classification table) of like correctly predicted 1s and 2s that would be great as well to get specificity and sensitivity.

CodePudding user response:

We may use ifelse

data$result <- with(data, ifelse(pred > 1.5, 2, 1))

CodePudding user response:

The ifelse solution works well when you have a single threshold (i.e. two result values), and it’s what you should use in that situation. However, it does not generalise well.

To generalise the problem of assigning numbers to adjacent groups, we can use the R function cut.

cut allows you to define a vector of thresholds, and classifies the input into the appropriate buckets delimited by these thresholds. However, to use it we need to add an arbitrarily low and high bound on either extreme. -Inf and Inf work well for this.

cut does not return integers but interval names (e.g. something like (-Inf,1.5]). If we only want numbers, we can use findIntervals instead of cut.

With this, your code could look like this:

within(data, result <- findInterval(pred, c(-Inf, 1.5, Inf)))

OK, that’s boring: every value is 2 on your test data. But let’s say you you want to use the thresholds 1.5, 1.7 and 2.0:

within(data, result <- findInterval(pred, c(-Inf, 1.5, 1.7, 2.0, Inf)))
#   Spike_Numeric     pred result
# 1             2 1.980749      3
# 2             2 2.020432      4
# 3             2 1.775711      3
# 4             2 1.715957      3
# 5             2 1.537048      2
# 6             2 2.057644      4
  • Related