I have a very large data set of Subjects who underwent two Protocols. Here is an example df:
df <- data.frame (Subject = c('875','875','875','875','875','875','875','875',
'875','1392','1392','1392','1392','1392','1392',
'1392','1392','1392','1392','1392'),
StartDate = c('20160915','20160916','20160917','20160918',
'20160926','20160927','20160928','20160929',
'20160930','20160917','20160918','20160919',
'20160920','20160921','20160922','20161005',
'20161006','20161007','20161008','20161009'),
Protocol = c('Training','Training','Training','Training',
'Test','Test','Test','Test','Test','Training',
'Training','Training','Training','Training',
'Training','Test','Test','Test','Test','Test'),
Points = c('14','32','33','50','41','56','41','48','49','34',
'16','25','21','34','28','61','24','45','45','47'),
Attempts = c('24','45','37','69','56','59','53','67','53','55',
'34','36','26','48','44','64','36','62','53','55'))
I would like to take the StartDate of each Protocol and change it to a Day of Testing, or even just add in a column with Day of Testing.
There are irregular days for each Protocol, so a simple numbering is not working. And the actual df has thousands of rows, so doing it by hand would be a nightmare. I am looking to be able to compare specific days of a protocol between subjects (ie Day 3 of Training between subjects, etc.)
I would like to change the data frame to look like:
df2 <- data.frame (Subject = c('875','875','875','875','875','875','875','875',
'875','1392','1392','1392','1392','1392','1392',
'1392','1392','1392','1392','1392'),
StartDate = c('20160915','20160916','20160917','20160918',
'20160926','20160927','20160928','20160929',
'20160930','20160917','20160918','20160919',
'20160920','20160921','20160922','20161005',
'20161006','20161007','20161008','20161009'),
Day = c('1','2','3','4','1','2','3','4','5','1','2','3',
'4','5','6','1','2','3','4','5'),
Protocol = c('Training','Training','Training','Training',
'Test','Test','Test','Test','Test','Training',
'Training','Training','Training','Training',
'Training','Test','Test','Test','Test','Test'),
Points = c('14','32','33','50','41','56','41','48','49','34',
'16','25','21','34','28','61','24','45','45','47'),
Attempts = c('24','45','37','69','56','59','53','67','53','55',
'34','36','26','48','44','64','36','62','53','55'))
Is there a way to do this? Or does anyone have other ideas?
CodePudding user response:
What do you mean by "There are irregular days for each Protocol"?
Assuming you have your data ordered by StartDate
. You could do this
df %>%
group_by(Subject, Protocol) %>%
mutate(Day = row_number()) %>%
ungroup
# A tibble: 20 × 6
Subject StartDate Protocol Points Attempts Day
<fct> <fct> <fct> <fct> <fct> <int>
1 875 20160915 Training 14 24 1
2 875 20160916 Training 32 45 2
3 875 20160917 Training 33 37 3
4 875 20160918 Training 50 69 4
5 875 20160926 Test 41 56 1
6 875 20160927 Test 56 59 2
7 875 20160928 Test 41 53 3
8 875 20160929 Test 48 67 4
9 875 20160930 Test 49 53 5
10 1392 20160917 Training 34 55 1
11 1392 20160918 Training 16 34 2
12 1392 20160919 Training 25 36 3
13 1392 20160920 Training 21 26 4
14 1392 20160921 Training 34 48 5
15 1392 20160922 Training 28 44 6
16 1392 20161005 Test 61 64 1
17 1392 20161006 Test 24 36 2
18 1392 20161007 Test 45 62 3
19 1392 20161008 Test 45 53 4
20 1392 20161009 Test 47 55 5