Home > Software design >  Is there an R function that counts the number of previous number of dates in a data frame and based
Is there an R function that counts the number of previous number of dates in a data frame and based

Time:12-14

I want to count the number of previous absences each student has had before their most recent one and add those counts as a column in the data frame.

 Student ID       Absent Date       Subject        

    4567           08/30/2018          M
    4567           09/22/2019          M
    8345           09/01/2019          S
    8345           03/30/2019         PE         
    8345           07/18/2017          M
    5601           01/08/2019         SS

This is the desired output:

 Student ID       Absent Date       Subject       Previous Absence            

    4567           08/30/2018          M                 1
    4567           09/22/2019          M                 1
    8345           09/01/2019          S                 2
    8345           03/30/2019         PE                 2        
    8345           07/18/2017          M                 2
    5601           01/08/2019         SS                 0

I then want to calculate the number of previous absences each student had in math (M) and add those counts as a column in the data frame.

 Student ID       Absent Date       Subject       Previous Absence            

    4567           08/30/2018          M                 1
    4567           09/22/2019          M                 1
    8345           09/01/2019          S                 2
    8345           03/30/2019         PE                 2        
    8345           07/18/2017          M                 2
    5601           01/08/2019         SS                 0

The desired output:

 Student ID  Absent Date  Subject  Prior Absence  Prior M Absence              

    4567      08/30/2018       M           1            1
    4567      09/22/2019       M           1            1
    8345      09/01/2019       S           2            0
    8345      03/30/2019      PE           2            0        
    8345      07/18/2017       M           2            0
    5601      01/08/2019      SS           0            0

Thank you!

CodePudding user response:

This assumes that the data is already sorted by Absent_Date (at least within each Student_ID):

library(dplyr)
df %>%
  group_by(Student_ID) %>%
  mutate(
    n_prior_absence = n() - 1,
    n_prior_absence_math = sum(head(Subject, -1) == "M")
  )
# # A tibble: 6 × 5
# # Groups:   Student_ID [3]
#   Student_ID Absent_Date Subject n_prior_absence n_prior_absence_math
#        <int> <chr>       <chr>             <dbl>                <int>
# 1       4567 08/30/2018  M                     1                    1
# 2       4567 09/22/2019  M                     1                    1
# 3       8345 09/01/2019  S                     2                    0
# 4       8345 03/30/2019  PE                    2                    0
# 5       8345 07/18/2017  M                     2                    0
# 6       5601 01/08/2019  SS                    0                    0

Using this data:

df = read.table(text = 'Student_ID       Absent_Date       Subject        
4567           08/30/2018          M
4567           09/22/2019          M
8345           09/01/2019          S
8345           03/30/2019         PE         
8345           07/18/2017          M
5601           01/08/2019         SS', header = T)
  • Related