I hope everyone is doing well. I am having a bit of a brain fart trying to aggregate in R. Lets say I have this df:
student | subject |
---|---|
Amber | math |
Colin | math |
Bob | science |
Amber | math |
Amber | science |
And I want to get a count of the number of times the student's subject is math and add that to the data frame, so the result would look like this:
student | subject | total 'math' |
---|---|---|
Amber | math | 2 |
Colin | math | 1 |
Bob | science | 0 |
Amber | math | 2 |
Amber | science | 2 |
Is this possible? I tried aggregate(subject["math"] ~ student, data = df, length) just to get the first part done, but I get "Error in model.frame.default(formula = subject["math"] ~ : variable lengths differ (found for 'student')".
Thank you in advance!
CodePudding user response:
I've tried a different approach and it's different from your desire output but does that work for you ?
my_df <- data.frame("Student" = c("Amber", "Colin", "Bob", "Amber", "Amber"),
"Subject" = c("math", "math", "science", "math", "science"),
stringsAsFactors = FALSE)
my_df <- my_df %>% group_by(Student, Subject) %>% summarise("Total" = n())
CodePudding user response:
library(dplyr)
df_with_count<-df%>%group_by(student,subject)%>%mutate(count=n())
found here: https://www.tutorialspoint.com/how-to-add-a-new-column-in-an-r-data-frame-with-count-based-on-factor-column
CodePudding user response:
I think that you want something like this
library(magrittr)
library(dplyr)
df <- data.frame(
student = c("Amber", "Colin", "Bob", "Amber", "Amber"),
subject = c("math", "math", "science", "math", "science")
)
df %>% group_by(student,subject) %>% mutate(`Total math` = n()) %>% filter(`Total math` > 0) %>% filter (subject=="math") %>% distinct -> df2
merge(x=df, y=df2, by="student", all.x = TRUE) %>% mutate(`Total math` = ifelse(!is.na(`Total math`), `Total math`,0)) %>% rename(subject="subject.x") %>% select(student, subject, `Total math`) %>% print