I have a data frame named mydata and it looks like like this:
Date.created team_member TaskTime
2022/08 Karina 0.33
2022/08 Jelena 0.33
2022/08 Elina 0.67
2022/08 Jelena 0.67
2022/08 Karina 0.33
2022/07 Jelena 0.33
2022/07 Jelena 0.33
2022/07 Karina 0.67
2022/07 Elina 0.33
2022/07 Elina 0.67
I need to calculate sum of TaskTime by month and by person
to get smt like
2022/08 Karina 0.66
2022/08 Jelena 1
2022/08 Elina 0.67
2022/07 Jelena 0.66
2022/07 Karina 0.67
2022/07 Elina 1
I have tried the code
library(dplyr)
mydata2 <- mydata %>%
group_by(mydata$Date.created, mydata$team_member) %>%
summarise(TaskTime=sum(mydata$TaskTime))
However it gives me a wrong sum. It returns me a sum, not the the sum of the group, but total sum time of all tasks in the dataframe.
2022/08 Karina 4.66
2022/08 Jelena 4.66
2022/08 Elina 4.66
2022/07 Jelena 4.66
2022/07 Karina 4.66
2022/07 Elina 4.66
Why this could happen and what could be done to solve the issue
CodePudding user response:
Remove the mydata$
references to fix the issue.
What’s happening is you are using the columns from the original data, instead of the current data in the pipeline. As a result the grouping is not taken into account.
mydata %>%
group_by(Date.created, team_member) %>%
summarise(TaskTime = sum(TaskTime))
CodePudding user response:
You have to remove mydata$
from your code. sum(mydata$TaskTime)
will always be the sum of all values in mydata. So try this:
mydata %>%
group_by(Date.created, team_member) %>%
summarise(TaskTime=sum(TaskTime))