I have a table looks like this.
Year District week_number
1 2020 11 12
2 2020 2 12
3 2020 2 12
4 2020 2 12
5 2020 7 12
6 2020 2 12
7 2020 12 12
8 2020 4 12
9 2020 10 12
10 2020 2 12
11 2020 22 12
12 2020 6 12
13 2020 3 12
14 2020 9 12
15 2020 3 12
I want to sum number of district by week_number, so i use formula below. A new total column is created as well.
crime4 = crime3 %>%
group_by(District) %>%
mutate(Total = sum(week_number, na.rm=TRUE)) %>%
ungroup()
However, the result returns with duplicates in District.
Year District week_number Total
<chr> <int> <dbl> <dbl>
1 2020 11 12 607260
2 2020 2 12 436255
3 2020 2 12 436255
4 2020 2 12 436255
5 2020 7 12 605288
6 2020 2 12 436255
7 2020 12 12 483924
8 2020 4 12 628875
9 2020 10 12 467589
10 2020 2 12 436255
Any better method i can get the Total by not having duplicates in District? It should be a simple solution but i just can't crack it
CodePudding user response:
If you want the number of distinct district by week (currently you have kind of the opposite!) you can do:
library(tidyverse)
crime3 %>%
group_by(week_number) %>%
summarize(n_districts = length(unique(District)))
CodePudding user response:
Do either of these approaches give you your desired outcome?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
crime3 <- read.table(text = " Year District week_number
1 2020 11 12
2 2020 2 12
3 2020 2 12
4 2020 2 12
5 2020 7 12
6 2020 2 12
7 2020 12 12
8 2020 4 12
9 2020 10 12
10 2020 2 12
11 2020 22 12
12 2020 6 12
13 2020 3 12
14 2020 9 12
15 2020 3 12",
header = TRUE)
# One potential option:
crime3 %>%
group_by(District) %>%
summarise(Total = sum(week_number, na.rm=TRUE))
#> # A tibble: 10 × 2
#> District Total
#> <int> <int>
#> 1 2 60
#> 2 3 24
#> 3 4 12
#> 4 6 12
#> 5 7 12
#> 6 9 12
#> 7 10 12
#> 8 11 12
#> 9 12 12
#> 10 22 12
# Or perhaps:
crime3 %>%
group_by(District) %>%
mutate(Total = sum(week_number, na.rm=TRUE)) %>%
ungroup() %>%
distinct()
#> # A tibble: 10 × 4
#> Year District week_number Total
#> <int> <int> <int> <int>
#> 1 2020 11 12 12
#> 2 2020 2 12 60
#> 3 2020 7 12 12
#> 4 2020 12 12 12
#> 5 2020 4 12 12
#> 6 2020 10 12 12
#> 7 2020 22 12 12
#> 8 2020 6 12 12
#> 9 2020 3 12 24
#> 10 2020 9 12 12
Created on 2022-08-25 by the reprex package (v2.0.1)