I am new to programming in R and from a data frame that I have previously created, I have to add a row that contains the mean of all the columns, this row has to be called 'Average'. It should be noted that the last column contains NA values.
So far I have managed to create the new row with the mean for each of the columns and in the last column, omitting the NA values. What I need is to name the row and put it 'Average', by default it appears 51 and I don't know how to change it, I have read about the function row.names () etc, but I can't get anything out.
Can someone help me out? I would be enormously grateful
I leave the dataframe here (state.df):
state.df = as.data.frame(state.x77)
This is what i've done:
apply = apply(state.df, MARGIN = 2, mean, na.rm = TRUE)
rbind(state.df,apply)
I have tried this as well and a few other things but it doesn't work for me.
rbind(state.df,apply, rownames(state.df, prefix = 'Average'))
In summary, with apply I already have what I basically want, I just want to change the name 51 that appears when I do the rbind and change it to 'Average'
CodePudding user response:
rbind
constructs row names from argument names only for arguments that are matrix-like. Both apply
and colMeans
return a vector that is not matrix-like. You can use t
to coerce this vector to a matrix, so that the argument name (in this case Average
) is actually used.
dd <- data.frame(x = rnorm(10L), y = c(NA, rnorm(9L)))
rbind(dd, Average = t(colMeans(dd, na.rm = TRUE)))
# x y
# 1 0.4070128 NA
# 2 1.2352564 0.5730119
# 3 -0.5842432 -0.2096068
# 4 0.1695935 -1.0667109
# 5 -0.7393369 -1.5895364
# 6 -0.7394052 -1.0886582
# 7 0.9922455 -0.2560118
# 8 -3.0080877 -2.1085712
# 9 -0.3629210 -1.9192967
# 10 -0.5564323 -0.5459473
# Average -0.3186318 -0.9123697
rbind(dd, Average = colMeans(dd, na.rm = TRUE))
# x y
# 1 0.4070128 NA
# 2 1.2352564 0.5730119
# 3 -0.5842432 -0.2096068
# 4 0.1695935 -1.0667109
# 5 -0.7393369 -1.5895364
# 6 -0.7394052 -1.0886582
# 7 0.9922455 -0.2560118
# 8 -3.0080877 -2.1085712
# 9 -0.3629210 -1.9192967
# 10 -0.5564323 -0.5459473
# 11 -0.3186318 -0.9123697
Of course, you can always modify row names after the fact with row.names<-
, as pointed out in the comments.
You could replace colMeans(dd, na.rm = TRUE)
with apply(dd, 2L, mean, na.rm = TRUE)
and get the same results, but colMeans
is faster. For operations other than mean, you may need apply
.