Home > Enterprise >  Problem with calculation of sum with usage of dplyr group_by
Problem with calculation of sum with usage of dplyr group_by

Time:09-17

I have a data frame named mydata and it looks like like this:

Date.created   team_member   TaskTime
2022/08          Karina         0.33
2022/08          Jelena         0.33
2022/08          Elina          0.67
2022/08          Jelena         0.67
2022/08          Karina         0.33
2022/07          Jelena         0.33
2022/07          Jelena         0.33
2022/07          Karina         0.67
2022/07          Elina          0.33
2022/07          Elina          0.67

I need to calculate sum of TaskTime by month and by person

to get smt like

2022/08          Karina         0.66
2022/08          Jelena         1
2022/08          Elina          0.67
2022/07          Jelena         0.66
2022/07          Karina         0.67
2022/07          Elina          1

I have tried the code

library(dplyr)
mydata2 <- mydata %>% 
  group_by(mydata$Date.created, mydata$team_member) %>% 
  summarise(TaskTime=sum(mydata$TaskTime))

However it gives me a wrong sum. It returns me a sum, not the the sum of the group, but total sum time of all tasks in the dataframe.

2022/08          Karina         4.66
2022/08          Jelena         4.66
2022/08          Elina          4.66
2022/07          Jelena         4.66
2022/07          Karina         4.66
2022/07          Elina          4.66

Why this could happen and what could be done to solve the issue

CodePudding user response:

Remove the mydata$ references to fix the issue.

What’s happening is you are using the columns from the original data, instead of the current data in the pipeline. As a result the grouping is not taken into account.

mydata %>% 
  group_by(Date.created, team_member) %>% 
  summarise(TaskTime = sum(TaskTime))

CodePudding user response:

You have to remove mydata$ from your code. sum(mydata$TaskTime) will always be the sum of all values in mydata. So try this:

mydata %>% 
  group_by(Date.created, team_member) %>% 
  summarise(TaskTime=sum(TaskTime))
  • Related