Home > Blockchain >  How to add standard deviation when calculating mean with setDT
How to add standard deviation when calculating mean with setDT

Time:10-23

I have a large dataframe, where I calculated the mean based on a given tag. I used this for a scatterplot, but I need to add errorbars given by standard deviation. Any way to do this via setDT, since this is where I calculated my mean?

The code I'm talking about is:

setDT(df)[, lapply(.SD, mean, na.rm=TRUE), keyby = tag]

CodePudding user response:

From your question, I understand you intend to use the data.table package.

setDT just converts a standard data.frame to a data.table.

A possible solution, with the iris dataset as example:

library(data.table)


# take a copy of iris
df <- copy(iris)

# rename "Species" to "tag"
setnames(df,"Species","tag")

result <- setDT(df)[, c(lapply(.SD, mean, na.rm=TRUE),
                        lapply(.SD, sd,   na.rm=TRUE)), keyby = tag]

# Rename columns
old <- setdiff(colnames(df),"tag")
new <- c("tag",paste0(old,".mean"),paste0(old,".sd"))
setnames(result, new)

result

#> Key: <tag>
#>           tag Sepal.Length.mean Sepal.Width.mean Petal.Length.mean
#>        <fctr>             <num>            <num>             <num>
#> 1:     setosa             5.006            3.428             1.462
#> 2: versicolor             5.936            2.770             4.260
#> 3:  virginica             6.588            2.974             5.552
#>    Petal.Width.mean Sepal.Length.sd Sepal.Width.sd Petal.Length.sd
#>               <num>           <num>          <num>           <num>
#> 1:            0.246       0.3524897      0.3790644       0.1736640
#> 2:            1.326       0.5161711      0.3137983       0.4699110
#> 3:            2.026       0.6358796      0.3224966       0.5518947
#>    Petal.Width.sd
#>             <num>
#> 1:      0.1053856
#> 2:      0.1977527
#> 3:      0.2746501
  • Related