I currently have a dataframe that has two columns: arrest date and number of arrests. The date column has almost every single day from 2006-2020; instead of having the number of arrests per day, I'd like to have the number of arrests per month, per year.
The dataframe is going to be converted into an xts object for a time series analysis so I need a resulting date column that has the year and month.
Below is the first 6 months of data from the dataset:
structure(list(ARREST_DATE = structure(c(13149, 13150, 13151,
13152, 13153, 13154, 13155, 13156, 13157, 13158, 13159, 13160,
13161, 13162, 13163, 13164, 13165, 13166, 13167, 13168, 13169,
13170, 13171, 13172, 13173, 13174, 13175, 13176, 13177, 13178,
13179, 13180, 13181, 13182, 13183, 13184, 13185, 13186, 13187,
13188, 13189, 13190, 13191, 13192, 13193, 13194, 13195, 13196,
13197, 13198, 13199, 13200, 13201, 13202, 13203, 13204, 13205,
13206, 13207, 13208, 13209, 13210, 13211, 13212, 13213, 13214,
13215, 13216, 13217, 13218, 13219, 13220, 13221, 13222, 13223,
13224, 13225, 13226, 13227, 13228, 13229, 13230, 13231, 13232,
13233, 13234, 13235, 13236, 13237, 13238, 13239, 13240, 13241,
13242, 13243, 13244, 13245, 13246, 13247, 13248, 13249, 13250,
13251, 13252, 13253, 13254, 13255, 13256, 13257, 13258, 13259,
13260, 13261, 13262, 13263, 13264, 13265, 13266, 13267, 13268,
13269, 13270, 13271, 13272, 13273, 13274, 13275, 13276, 13277,
13278, 13279, 13280, 13281, 13282, 13283, 13284, 13285, 13286,
13287, 13288, 13289, 13290, 13291, 13292, 13293, 13294, 13295,
13296, 13297, 13298, 13299, 13300, 13301, 13302, 13303, 13304,
13305, 13306, 13307, 13308, 13309, 13310, 13311, 13312, 13313,
13314, 13315, 13316, 13317, 13318, 13319, 13320, 13321, 13322,
13323, 13324, 13325, 13326, 13327, 13328, 13329), class = "Date"),
num_of_arrests = c(550L, 617L, 895L, 1224L, 1379L, 1246L,
893L, 635L, 889L, 1316L, 1223L, 1264L, 1258L, 852L, 478L,
710L, 1131L, 1190L, 1309L, 1085L, 910L, 704L, 852L, 1278L,
1322L, 1250L, 1128L, 967L, 686L, 812L, 998L, 1350L, 1356L,
1292L, 1006L, 568L, 867L, 1296L, 1428L, 1327L, 1182L, 821L,
233L, 618L, 915L, 1370L, 1391L, 1237L, 992L, 649L, 888L,
1167L, 1369L, 1126L, 1071L, 888L, 615L, 831L, 1019L, 1364L,
1109L, 1239L, 962L, 720L, 930L, 1233L, 1413L, 1350L, 1258L,
1034L, 629L, 954L, 1181L, 1421L, 1332L, 974L, 924L, 680L,
958L, 1232L, 1389L, 1289L, 1189L, 931L, 672L, 824L, 1188L,
1332L, 1194L, 1005L, 1011L, 653L, 822L, 1252L, 1421L, 1316L,
1231L, 902L, 740L, 811L, 1184L, 1362L, 1401L, 1144L, 860L,
383L, 775L, 1143L, 1296L, 1271L, 1056L, 729L, 593L, 836L,
1264L, 1341L, 1298L, 1127L, 771L, 548L, 908L, 1290L, 1398L,
1297L, 1127L, 878L, 663L, 928L, 1258L, 1389L, 1300L, 1135L,
937L, 600L, 851L, 1173L, 1366L, 1211L, 958L, 912L, 602L,
843L, 1274L, 1368L, 1332L, 1068L, 823L, 589L, 482L, 1076L,
1217L, 1194L, 1020L, 822L, 628L, 895L, 1225L, 1116L, 1264L,
1254L, 829L, 747L, 911L, 1241L, 1291L, 1267L, 1182L, 924L,
438L, 826L, 1228L, 1361L, 1255L, 1095L, 763L, 594L, 860L,
1056L, 1157L, 1073L, 898L)), row.names = c(NA, 181L), class = "data.frame")
CodePudding user response:
To get the number of arrests per month, you could do as follows: extract the month and year by using the lubridate functions month()
and year()
, group by both of them (year could be omitted in your example, since there is only year 2006) and summarize()
the sum()
.
As requested, to get a column with year and month, paste()
them together, ungroup()
, deselect the helper columns and relocate()
yearmonth to the front.
Code
library(dplyr)
library(lubridate)
result <- data %>% mutate(year = year(ARREST_DATE), month = month(ARREST_DATE)) %>%
group_by(year, month) %>% summarise(arrests_per_month = sum(num_of_arrests)) %>%
mutate(yearmonth = paste(year, month, sep = "-")) %>% ungroup() %>%
select(-c(year, month)) %>% relocate(yearmonth)
Output
> result
# A tibble: 6 x 2
yearmonth arrests_per_month
<chr> <int>
1 2006-1 31051
2 2006-2 28872
3 2006-3 33910
4 2006-4 30541
5 2006-5 32253
6 2006-6 30414