I'm having the particular case in R where two categories are displayed in a yearly panel data but do not correspond to the level I am using, this is, there are two cities that belong to the same state (A) but are on their individual city levels (a and b) and the rest of the data is on State level (Dep).
year | Dep | Homicides |
---|---|---|
2000 | C | 20 |
2000 | a | 5 |
2000 | b | 3 |
2000 | D | 17 |
2000 | E | 18 |
2001 | C | 21 |
2001 | a | 6 |
2001 | b | 4 |
2001 | D | 18 |
2001 | E | 18 |
So I want to form a new category by sum using Dep == "a" and Dep =="b", that is, to sum my two cities that belongs to Dep A (also this is not exist, so it needs to be created) by each year, where the final output looks like:
year | Dep | Homicides |
---|---|---|
2000 | C | 20 |
2000 | A | 8 |
2000 | D | 17 |
2000 | E | 18 |
2001 | C | 21 |
2001 | A | 10 |
2001 | D | 18 |
2001 | E | 18 |
Thank you in advance!
CodePudding user response:
We could recode the 'Dep' levels to 'A' and use as grouping variable to get the sum
library(forcats)
library(data.table)
dt1[, .(Homicides = sum(Homicides)),
.(year, Dep = fct_other(Dep, drop = c("a", "b"), other_level = "A"))]
-output
year Dep Homicides
<int> <fctr> <int>
1: 2000 C 20
2: 2000 A 8
3: 2000 D 17
4: 2000 E 18
5: 2001 C 21
6: 2001 A 10
7: 2001 D 18
8: 2001 E 18
Or could use fct_collapse
as well
dt1[, .(Homicides = sum(Homicides)), .(year,
Dep = fct_collapse(Dep, A = c("a", "b")))]
-output
year Dep Homicides
<int> <fctr> <int>
1: 2000 C 20
2: 2000 A 8
3: 2000 D 17
4: 2000 E 18
5: 2001 C 21
6: 2001 A 10
7: 2001 D 18
8: 2001 E 18
data
dt1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2001L,
2001L, 2001L, 2001L, 2001L), Dep = c("C", "a", "b", "D", "E",
"C", "a", "b", "D", "E"), Homicides = c(20L, 5L, 3L, 17L, 18L,
21L, 6L, 4L, 18L, 18L)), class = c("data.table", "data.frame"
), row.names = c(NA, -10L))