Home > Enterprise >  Running a general loop- and if-function to compute column means
Running a general loop- and if-function to compute column means

Time:11-22

If i run this code with my data i get the output i should have

my_mean <- function(simulated_data){
  if(is.numeric(simulated_data)){
    return(sum(simulated_data)/length(simulated_data))
  }else{
    return('Non-numeric data')
  }
}

for (i in colnames(simulated_data)){
  cat(paste(colnames(simulated_data[i]), "Mean:", my_mean(simulated_data[[i]]), "\n"))
}

This is the output:

total_cost Mean: 1897.21529700626 
product_line Mean: Non-numeric data 
day Mean: Non-numeric data 
calander_week Mean: 25.5 
quantity Mean: 113.759646705788

But if have generalize the formula (as I have to do for my assignment), I run the following code:

means_function <- function(input_data){
  for(i in colnames(input_data)){
    if(is.numeric(input_data)){
      cat(paste(colnames(input_data[i]), "Mean:", mean(input_data[i]),"\n"))
    }else{
      cat(paste(colnames(input_data[i]), "Mean:", 'Non-numeric data', "\n"))
    }
  }
}

means_function(simulated_data)

And then I got the following output:

total_cost Mean: Non-numeric data 
product_line Mean: Non-numeric data 
day Mean: Non-numeric data 
calander_week Mean: Non-numeric data 
quantity Mean: Non-numeric data 

Can someone tell me what I'm doing wrong? I have to use the for-loop, the if-function and the means-function

CodePudding user response:

You were nearly there, here is the dataframe I used for this example

df <- data.frame(movieID = c("A","A","A","B","B","C","C","C"),
                 crewID = c("Z","Y","X","Z","V","V","X","Y"),
                 Rating = c(7.3,7.3,7.3,2.1,2.1,9,9,9))

Your for loop you are saying for i (a number) in column name doesn't make much sense.

means_function <- function(input_data){
  for(i in 1:ncol(input_data)){
    if(is.numeric(input_data[[i]])){
      cat(paste(colnames(input_data[i]), "Mean:", mean(input_data[[i]]),"\n"))
    }else{
      cat(paste(colnames(input_data[i]), "Mean:", 'Non-numeric data', "\n"))
    }
}
}

Call:

 means_function(df)

Output:

movieID Mean: Non-numeric data 
crewID Mean: Non-numeric data 
Rating Mean: 6.6375 

CodePudding user response:

As for loop iterates over each instance and execute the function. So in the below case

my_mean <- function(simulated_data){
  if(is.numeric(simulated_data)){
    return(sum(simulated_data)/length(simulated_data))
  }else{
    return('Non-numeric data')
  }
}

for (i in colnames(simulated_data)){
  cat(paste(colnames(simulated_data[i]), "Mean:", my_mean(simulated_data[[i]]), "\n"))
}

my_mean function is applied on a vector (simulated_data[[i]] returns a vector) whereas applying the same function for generalization on a data.frame doesn't work as per your need.

Reason for that not working is in the if statement which actually checks a data.frame and ultimately return output as FALSE for is.numeric(input_data)

means_function <- function(input_data){
  for(i in colnames(input_data)){
    if(is.numeric(input_data)){
      cat(paste(colnames(input_data[i]), "Mean:", mean(input_data[i]),"\n"))
    }else{
      cat(paste(colnames(input_data[i]), "Mean:", 'Non-numeric data', "\n"))
    }
  }
}

means_function(simulated_data)

To overcome this you may modify means_function as below which checks for a vector and also calculate mean of a vector and not data.frame:

means_function <- function(input_data){
  for(i in colnames(input_data)){
    if(is.numeric(input_data[, i])){
      cat(paste(colnames(input_data[i]), "Mean:", mean(input_data[, i]),"\n"))
    }else{
      cat(paste(colnames(input_data[i]), "Mean:", 'Non-numeric data', "\n"))
    }
  }
}
  • Related