Home > Enterprise >  How to add and rename a row in a dataframe using the apply function
How to add and rename a row in a dataframe using the apply function

Time:11-18

I am new to programming in R and from a data frame that I have previously created, I have to add a row that contains the mean of all the columns, this row has to be called 'Average'. It should be noted that the last column contains NA values.

So far I have managed to create the new row with the mean for each of the columns and in the last column, omitting the NA values. What I need is to name the row and put it 'Average', by default it appears 51 and I don't know how to change it, I have read about the function row.names () etc, but I can't get anything out.

Can someone help me out? I would be enormously grateful

I leave the dataframe here (state.df):

state.df = as.data.frame(state.x77)

This is what i've done:

apply = apply(state.df, MARGIN = 2, mean, na.rm = TRUE)
rbind(state.df,apply)

I have tried this as well and a few other things but it doesn't work for me.

rbind(state.df,apply, rownames(state.df, prefix = 'Average'))

In summary, with apply I already have what I basically want, I just want to change the name 51 that appears when I do the rbind and change it to 'Average'

CodePudding user response:

rbind constructs row names from argument names only for arguments that are matrix-like. Both apply and colMeans return a vector that is not matrix-like. You can use t to coerce this vector to a matrix, so that the argument name (in this case Average) is actually used.

dd <- data.frame(x = rnorm(10L), y = c(NA, rnorm(9L)))

rbind(dd, Average = t(colMeans(dd, na.rm = TRUE)))
#                  x          y
# 1        0.4070128         NA
# 2        1.2352564  0.5730119
# 3       -0.5842432 -0.2096068
# 4        0.1695935 -1.0667109
# 5       -0.7393369 -1.5895364
# 6       -0.7394052 -1.0886582
# 7        0.9922455 -0.2560118
# 8       -3.0080877 -2.1085712
# 9       -0.3629210 -1.9192967
# 10      -0.5564323 -0.5459473
# Average -0.3186318 -0.9123697

rbind(dd, Average = colMeans(dd, na.rm = TRUE))
#             x          y
# 1   0.4070128         NA
# 2   1.2352564  0.5730119
# 3  -0.5842432 -0.2096068
# 4   0.1695935 -1.0667109
# 5  -0.7393369 -1.5895364
# 6  -0.7394052 -1.0886582
# 7   0.9922455 -0.2560118
# 8  -3.0080877 -2.1085712
# 9  -0.3629210 -1.9192967
# 10 -0.5564323 -0.5459473
# 11 -0.3186318 -0.9123697

Of course, you can always modify row names after the fact with row.names<-, as pointed out in the comments.

You could replace colMeans(dd, na.rm = TRUE) with apply(dd, 2L, mean, na.rm = TRUE) and get the same results, but colMeans is faster. For operations other than mean, you may need apply.

  • Related