Say I have a dataframe df
df = data.frame(
a = c(1978, 1978, 1978, 1978, 1978, 1978,
1979, 1979, 1979, 1980, 1980),
b = c("A", "A", "A", "B", "B", "C", "A", "B", "C", "A", "C")
)
And I want the number of times each combination of a
and b
exist in the dataframe. I can do this,
test = df %>%
group_by(a, b) %>%
summarize(counts = n())
which gives
a b counts
1 1978 A 3
2 1978 A 3
3 1978 A 3
4 1978 B 2
5 1978 B 2
6 1978 C 1
7 1979 A 1
8 1979 B 1
9 1979 C 1
10 1980 A 1
11 1980 C 1
However, I may want to include the zeros (i.e. the cases where there is no combination, such as a = 1980
and b = B
. I can obviously do this with some ugly loops and if-statements, but I'm sure there's a dplyr
solution I'm just not thinking of.
Any help much appreciated!
CodePudding user response:
Found the answer right after posting - I don't even need dplyr
, table()
works just fine:
> data.frame(table(df))
a b Freq
1 1978 A 3
2 1979 A 1
3 1980 A 1
4 1978 B 2
5 1979 B 1
6 1980 B 0
7 1978 C 1
8 1979 C 1
9 1980 C 1
And if you have a dataframe with more columns you don't want to include like
df = data.frame(
a = c(1978, 1978, 1978, 1978, 1978, 1978,
1979, 1979, 1979, 1980, 1980),
b = c("A", "A", "A", "B", "B", "C", "A", "B", "C", "A", "C"),
c = c("x", "x", "y", "x", "a", "x", "yc", "a", "c", "b", "z")
)
Then you can still use the same solution, just specify which columns:
> data.frame(table(df[,c("a", "b")]))
a b Freq
1 1978 A 3
2 1979 A 1
3 1980 A 1
4 1978 B 2
5 1979 B 1
6 1980 B 0
7 1978 C 1
8 1979 C 1
9 1980 C 1
CodePudding user response:
An dplyr
option with complete
from tidyr
:
df = data.frame(
a = c(1978, 1978, 1978, 1978, 1978, 1978,
1979, 1979, 1979, 1980, 1980),
b = c("A", "A", "A", "B", "B", "C", "A", "B", "C", "A", "C")
)
library(dplyr)
library(tidyr)
df %>%
group_by(a,b) %>%
summarise(counts = n()) %>%
ungroup() %>%
complete(a, b, fill = list(counts = 0))
#> `summarise()` has grouped output by 'a'. You can override using the `.groups`
#> argument.
#> # A tibble: 9 × 3
#> a b counts
#> <dbl> <chr> <int>
#> 1 1978 A 3
#> 2 1978 B 2
#> 3 1978 C 1
#> 4 1979 A 1
#> 5 1979 B 1
#> 6 1979 C 1
#> 7 1980 A 1
#> 8 1980 B 0
#> 9 1980 C 1
Created on 2022-07-08 by the reprex package (v2.0.1)
CodePudding user response:
With a data.table
approach:
library(data.table)
res <- setDT(df)[, counts := .N, .(a,b)][
CJ(a = a, b = b, unique = T), on=.(a, b)]
setnafill(unique(res), fill = 0, cols = "counts")[]
#> a b counts
#> 1: 1978 A 3
#> 2: 1978 B 2
#> 3: 1978 C 1
#> 4: 1979 A 1
#> 5: 1979 B 1
#> 6: 1979 C 1
#> 7: 1980 A 1
#> 8: 1980 B 0
#> 9: 1980 C 1