Home > Software engineering >  Make new columns that are row sums of existing columns using a loop in R
Make new columns that are row sums of existing columns using a loop in R

Time:11-19

I would like to know how to make new columns that are row sums of existing columns using a loop. Given this data,

df <- data.frame(A=c(22, 25, 29, 13, 22, 30),
                 B=c(12, 10, 6, 6, 8, 11),
                 C=c(NA, 15, 15, 18, 22, 13))

I would like to make two columns called a1 and a2, where a1 is the row sums of columns A and B, and a2 is the row sums of columns A, B, and C.

The desired output would look as follows.

 ---- ---- ---- ---- ---- 
| A  | B  | C  | a1 | a2 |
 ---- ---- ---- ---- ---- 
| 22 | 12 | NA | 34 | 34 |
| 25 | 10 | 15 | 35 | 50 |
| 29 |  6 | 15 | 35 | 50 |
| 13 |  6 | 18 | 19 | 37 |
| 22 |  8 | 22 | 30 | 52 |
| 30 | 11 | 13 | 41 | 54 |
 ---- ---- ---- ---- ---- 

I tried the following methods, but these methods are giving me errors.

First, I tried using dplyr

for(i in 1:2) {
  df<-df%>%
    mutate_(paste0("a",i)= rowSums(df[,1:(1 i)],na.rm=TRUE))
}

Second, I tried using data.table

for(i in 1:2) {
  df<-df[,paste0("a",i) := rowSums(df[,1:(1 i)])]
}

I would like to know how to get the desired output in both ways Also, I think using a loop may not be the best method. I also would like to know how to do this using "apply" functions, if possible.

Thank you so much in advance!

CodePudding user response:

Here you go

for(i in 1:2) {
  df[[paste0("a",i)]] <- rowSums(df[, 1:(i 1)], na.rm = TRUE)
}

df

   A  B  C a1 a2
1 22 12 NA 34 34
2 25 10 15 35 50
3 29  6 15 35 50
4 13  6 18 19 37
5 22  8 22 30 52
6 30 11 13 41 54

CodePudding user response:

To answer your question about using the apply(df, MARGIN, FUN, ...) function, all you have to remember is that the margin 1 is for row-wise operations and 2 is used for column-wise operations.

Also, you can add any additional function arguments within the apply function!

So, in your case, if you use apply(df, 1, sum, na.rm = T), the function will calculate all the row sums, while removing all the NA.

Instead of using dplyr or data.table, you could do this by

df["a1"] = apply(df[1:2], 1, sum, na.rm = T)
df["a2"] = apply(df[1:3], 1, sum, na.rm = T)
  •  Tags:  
  • r
  • Related