How do I count the amount of a variable?-CodePudding

I have a data set with student names, their student ID, and the date they took an exam taken. It looks a bit like this:

Student ID	Student Name	Date Taken
01	Christian	2009-09-24
01	Christian	2015-06-16
01	Christian	2014-05-14
12	Laura	2017-01-04
12	Laura	2009-05-12
43	Alex	2014-05-07
43	Alex	2009-05-01
43	Alex	2009-08-24
43	Alex	2013-04-29
06	Sarah	2018-05-03
06	Sarah	2015-06-24

I want to create a variable that accounts for the cumulative number of exams taken by each student, without counting the first exam they have taken (based on the date they took it).

I tried doing this by:

count(Student ID, "01")
count(Student ID, "02")

etc.

But I feel that this is not the most efficient way to do it and it also doesn't give me a good summary of the data by student name. I want to use student ID vs student name, because some names repeat, yet their student ID is different. I'm unsure how to go about this code, or if I am in the right track.

I was thinking as well of making an extra column where each exam value is counted as 1, and first projects launched as 0. And then adding the numbers per ID. But I am also unsure on how to go about the code.

Any suggestions? Thank you a million beforehand!

CodePudding user response：

If you want to add a new column with the cumulative sum, then we can group by id and arrange by date, then create 0 for the first row and 1 for the other rows, then do cumsum.

library(tidyverse)

df %>%
  group_by(`Student ID`) %>%
  arrange(`Student ID`, `Date Taken`) %>%
  mutate(exams = ifelse(row_number()==1, 0, 1),
         exams = cumsum(exams))

Output

   Student.ID Student.Name Date.Taken exams
        <int> <chr>        <chr>      <dbl>
 1          1 Christian    2009-09-24     0
 2          1 Christian    2014-05-14     1
 3          1 Christian    2015-06-16     2
 4          6 Sarah        2015-06-24     0
 5          6 Sarah        2018-05-03     1
 6         12 Laura        2009-05-12     0
 7         12 Laura        2017-01-04     1
 8         43 Alex         2009-05-01     0
 9         43 Alex         2009-08-24     1
10         43 Alex         2013-04-29     2
11         43 Alex         2014-05-07     3

Data

df <- structure(list(`Student ID` = c(1L, 1L, 1L, 12L, 12L, 43L, 43L, 
43L, 43L, 6L, 6L), `Student Name` = c("Christian", "Christian", 
"Christian", "Laura", "Laura", "Alex", "Alex", "Alex", "Alex", 
"Sarah", "Sarah"), `Date Taken` = c("2009-09-24", "2015-06-16", 
"2014-05-14", "2017-01-04", "2009-05-12", "2014-05-07", "2009-05-01", 
"2009-08-24", "2013-04-29", "2018-05-03", "2015-06-24")), class = "data.frame", row.names = c(NA, 
-11L))

CodePudding user response：

If your dataframe as above is d, then you can use the dplyr package, and group by Student ID

library(dplyr)

d %>% 
  group_by(`Student ID`, `Student Name`) %>% 
  summarize(exams = n()-1)

Output:

  `Student ID` `Student Name` exams
         <int> <chr>          <dbl>
1            1 Christian          2
2            6 Sarah              1
3           12 Laura              1
4           43 Alex               3

If you want each row retained, and the count of exams, starting at 0, you can do this:

d %>% 
  arrange(`Date Taken`) %>% 
  group_by(`Student ID`) %>% 
  mutate(exams = row_number()-1)