Home > other >  R use both sumCol and aggregate
R use both sumCol and aggregate

Time:12-16

Is there a nice way to sum and aggregate the following dataframe:

Persons     item1, item2, item3, ....
1           0      1      3      ....
1           2      2      4      ....
2           1      2      4      ....
3           0      1      1      ....
1           1      1      1      ....
...         ...    ...    ...    ....

To:

Persons  Items
1        15
2        7
3        2
...      ...

So basically sum up all the items for each person and add that sum to another row (if that person has another row).

CodePudding user response:

Here's one way:

library(dplyr)

Persons <- c(1, 1, 2, 3, 1)
item1 <- c(0, 2, 1, 0, 1)
item2 <- c(1, 2, 2, 1, 1)
item3 <- c(3, 4, 4, 1, 1)

data_1 <- data.frame(Persons, item1, item2, item3)

data_1$Items_raw <- rowSums(data_1[,2:4])
data_1 <- data_1 %>% group_by(Persons) %>% summarize(Items = sum(Items_raw))

CodePudding user response:

In base R you could use tapply...

tapply(rowSums(df[,-1]), df[,1], sum)

 1  2  3 
15  7  2 

CodePudding user response:

You can use the dplyr package.

library(dplyr)

df <- data.frame(
  Persons = c(1, 1, 2, 3, 1),
  item1 = c(0, 2, 1, 0, 1),
  item2 = c(1, 2, 2, 1, 1),
  item3 = c(3, 4, 4, 1, 1)
)

df %>% 
  group_by(Persons) %>% 
  summarise(items = sum(item1, item2, item3))

And you'll obtain:

# A tibble: 3 x 2
  Persons items
    <dbl> <dbl>
1       1    15
2       2     7
3       3     2

CodePudding user response:

Thanks to combining all the reaction under my post (thanks everyone!), I found a solution to my problem. I solved it the following way:

People = data.frame(People$Persons, 
rowSums(People[, 2:522]))
colnames(People) = c("Persons", "Items")
People = ddply(People, "Persons", numcolwise(sum))

Its not a beautiful solution (wish I could do it in 1 line) but it works!

  • Related