Home > Enterprise >  Combine multiple rows to one row per group
Combine multiple rows to one row per group

Time:11-07

I feel like my problem is an easy one to solve, however, I can't seem to figure it out.

I want to combine multiple rows that belong to the same group so that per group there is one row. This row has the sum over the rows for some variables and the mean for other variables. In the example, I only included variable treatment of which I need the sum across the rows of each group episode.

Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
                 year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"), 
                 time1 = c("1950", "1951", "1951", "1953", "1954", "1950", "1951", "1952", "1952"), 
                 time2 = c("1951", "1953", "1953", "1954", "1955", "1951", "1952", "1954", "1954"),
                 episode = c("1", "2", "2", "3", "4", "1", "2", "3", "3"),
                 status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
                 treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))

Df2 <- data.frame(country = c("A", "A", "A", "A", "B", "B", "B"),
                   time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"), 
                   time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
                   episode = c("1", "2", "3", "4", "1", "2", "3"),
                   status = c(0, 1, 0, 1, 1, 0, 1),
                   treatment = c(10, 20, 5, 0, 0, 30, 110))

Any ideas on how to solve this?

CodePudding user response:

Something like this:

library(dplyr)
Df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(episode) %>% 
  summarise(sumTreatment=sum(treatment, na.rm = TRUE))
  episode sumTreatment
    <int>        <int>
1       1           10
2       2           50
3       3          115
4       4            0

CodePudding user response:

How about this?

library(tidyverse)    
Df2 %>% filter(!is.na(treatment))%>%group_by(episode) %>% summarise(sumTreatment = sum(treatment))
  • Related