Im working to transform my table in this way so i can join it on another table. Here is a sample of what my initial table looks like:
df1 <- data.frame(ID = c(1,1,1,2,2,2,3,3,3),
date=c('2021-11-01', '2021-11-01', '2021-11-02','2021-11-01',
'2021-11-01', '2021-11-02','2021-11-01', '2021-11-01', '2021-11-02'
),
event_name = c('a', 'a', 'c', 'a', 'b', 'b','a', 'b', 'c'),
Time_duration = c(1, 3, 5, 9, 2, 4, 1, 6, 8))
And this is an example of how I'd like the data to look after spreading it:
df2 <- data.frame(ID = c(1,1, 2,2, 3,3),
date=c('2021-11-01', '2021-11-02',
'2021-11-01', '2021-11-02',
'2021-11-01', '2021-11-02'
),
event_A_duration_sum = c(4, 0,
9, 0,
1,0),
event_B_duration = c(0, 0,
2, 4,
6,0),
event_C_duration = c(0, 5,
0, 0,
0,8))
In my final table, I need to have the data grouped by (ID, Date); a unique identified for my final table would be both the ID and date. Each ID and date can have multiple events of the same type, for instance.
I hope this makes sense. Should I concatenate my identifiers and then split them back, after? Or is there a better way to do this using DPLYR?
Cheers, appreciate any help.
CodePudding user response:
library(data.table)
dcast(data.table(df1),
ID date ~ event_name,value.var = 'Time_duration',
fun.aggregate = sum)
Key: <ID, date>
ID date a b c
<num> <char> <num> <num> <num>
1: 1 2021-11-01 4 0 0
2: 1 2021-11-02 0 0 5
3: 2 2021-11-01 9 2 0
4: 2 2021-11-02 0 4 0
5: 3 2021-11-01 1 6 0
6: 3 2021-11-02 0 0 8