I have a data frame as the following.
PERSONID STRTTIME ENDTIME origin dest
1 1 1000 1015 H O
2 1 1510 1530 O H
3 2 1800 2030 H W
4 2 700 900 W H
5 3 1430 1445 O H
6 3 845 900 H O
7 1 1115 1130 H W
8 1 2330 2340 W H
9 1 1715 1850 H O
10 1 900 920 H W
11 1 2055 2145 O H
12 1 1530 1605 W H
13 1 1415 1440 W W
14 1 550 605 H O
15 1 805 815 O H
16 1 730 740 H O
17 1 700 715 O H
18 1 1500 1515 H O
19 1 1700 1715 O H
20 1 1015 1025 O O
21 1 830 845 H O
22 1 1050 1105 O H
23 2 1800 1818 H O
24 2 2135 2154 O H
25 1 800 820 H W
26 1 1100 1120 W O
27 1 1125 1130 O H
28 2 1200 1205 H O
29 2 1315 1330 O O
30 2 1405 1415 O H
31 1 800 945 O W
32 1 1800 2000 W H
33 2 900 1100 H W
34 2 1830 2030 W H
35 1 1725 1855 W H
36 1 1200 1210 W O
37 1 710 820 H W
38 1 1245 1255 O W
39 2 1625 1700 O H
40 2 1535 1554 W O
41 1 1653 1657 W O
42 1 809 813 O O
43 1 1718 1726 O H
44 1 824 828 O W
45 1 745 752 H O
46 2 738 850 O O
47 2 730 733 H O
48 2 858 949 O W
49 1 0 30 W H
50 1 1300 1325 H W
I want to do cross-tabulation between "origin" and "dest" but by each calendar hour(1am, 2am,... 11pm) based on STRTTIME and ENDTIME. How could I do it? The format of STRTTIME and ENDTIME is HHMM (e.g., 1015 = 10:15am)
This is what I want, but by hour.
H O W
H 5446 131981 53612
O 143058 130553 21667
W 45056 30682 11568
CodePudding user response:
Suppose your data is df
, this script will display the total time in hours (if i understand correctly):
library(dplyr)
library(tidyr)
df %>%
mutate( hours = unclass(difftime(
strptime(sprintf("d",ENDTIME), "%H%M"),
strptime(sprintf("d",STRTTIME), "%H%M"), units = "hours"))) %>%
group_by(origin, dest) %>%
summarise(hours = sum(hours)) %>%
pivot_wider(
names_from= dest, values_from = hours)