Home > front end >  Apply calculation to specified number of columns and store the result in separate objects, to be com
Apply calculation to specified number of columns and store the result in separate objects, to be com

Time:10-23

I have a data frame in which I want to apply a calculation to a varying amount of columns that are specified, and store the results in separate objects, to be combined at the end.

A minimal example would look like:

Name <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Base <- c(0, 0, 0, 1, 1)
C1 <- c(1, 0, 1, 1, 0)
C2 <- c(0, 1, 1, 1, 0)
C3 <- c(0, 1, 0, 0, 0)
C4 <- c(1, 1, 0, 1, 0)
Data <- data.frame(Name, Base, C1, C2, C3, C4)

score.calc <- function(data, col.names){

                       # This is how I would to it outside a function and without loop:
                       Score1 <- sum(pmin(Data$C1, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Score2 <- sum(pmin(Data$C2, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Score3 <- sum(pmin(Data$C3, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Scores <- c(Score1, Score2, Score3)
}

new.score <- score.calc(Data,
                        col.names= c("C1", "C2", "C3"))

And should return:

> new.score
[1] 0.5 0.5 0.0

Anyone with an idea? Many thanks!

CodePudding user response:

Try this:

score.calc <- function(data, col.names, base = "Base") {
  sapply(subset(data, select = col.names),
         function(z) sum(pmin(z, Data[[base]]))/sum(Data[[base]]))
}
score.calc(Data, c("C1", "C2", "C3"))
#  C1  C2  C3 
# 0.5 0.5 0.0 

The changes I made:

  1. Changed from the hard-coded $C1 (etc) to a more dynamic data[[nm]] mindset;
  2. Changed the hard-coded $Base to one based on the arguments, and with a default value so that while you shouldn't need to change it, if you want to apply it to different data (with a different "Base" column), you still can;
  3. Did it dynamically with sapply, which will return (here) a vector the same length as col.names ... assuming that all columns provided are present in the data and numeric-like;
  4. Use subset(., select=) instead of [, since the latter can drop to a vector instead of a single-column frame in some circumstances (i.e., base::[.data.frame and a simple data.frame, not a tbl_df).
  5. Removed two extraneous pmin. Its use with a single argument is a no-op: its functionality is providing the minimum element-wise between two or more vectors; to apply it to a single vector does nothing.
  • Related