Home > Mobile >  How can I insert a data frame in a function and then group by groups with tapply
How can I insert a data frame in a function and then group by groups with tapply

Time:12-05

I am new to programming in R and I have made a function that returns me some basic statistics from a list or vector that we insert. The problem comes when I want to insert a data frame.

The dataframe I want to insert has 2 columns; the first refers to a group (1 or 2) and the second refers to widths of the skull in cm (numerical values). I would like to take the mean of both groups separately so that later I can compare them (1 and 2), the mode, median, quartiles ... (everything I have inside the function).

It occurred to me to use the function that I had made to insert lists or vectors and then to group me, use the tapply function but it gives me an error by console, this one:

Error in tapply(archivo, archivo$`Época histórica`, descriptive_statistics) : 
  arguments must have same length

Here you have the function and the tapply that I did:

descriptive_statistics = function(x){
  result <- list(
    mean(x), exp(mean(log(x))), median(x), modes(x),
    (range(x)[2] - range(x)[1]), var(x), sqrt(var(x)), sqrt(var(x)) / mean(x)
  )
  names(result) <- c('Aritmetic mean', 'Geometric mean', 'Median', 'Mode', 'Range', 'Variance', 'Standard deviation', 'Pearsons coefficient of variation')
  
  result
}

tapply(archivo, archivo$`Época histórica`, descriptive_statistics)


What could I improve my function so that it lets me enter dataframes? or what could I do in the tapply function to make it work for me? Can someone give me a hand with this? I also accept other ideas, I have tried with aggregate and inside the summary function and such but it does not give me the statistics I want, such as Pearson's coefficient.

Thank you very much in advance, greetings

CodePudding user response:

Pass column of dataframe in the function instead of complete dataframe. You haven't shared your data so it is difficult to give specific answer but let's assume the other column is called col1. In that case you can do -

tapply(archivo$col1, archivo$`Época histórica`, descriptive_statistics)
  • Related