I have a dataset which looks like this:
english math science history art geography
<fct> <fct> <fct> <fct> <fct> <fct>
1 1 1 0 1 1 0
2 0 0 0 1 0 1
3 1 0 1 0 0 1
4 0 1 0 1 1 0
5 1 1 0 0 0 0
6 1 1 1 0 1 1
7 1 1 0 0 1 1
8 1 1 0 0 0 1
9 0 0 0 1 0 0
10 1 0 1 1 1 0
11 1 0 0 1 1 0
I am trying to count the instances across the whole dataframe that two variables appear, for example: both math and english both have the value of 1 for 5 instances.
I can count all the instances using this code:, and can do this for all the subjects
sum(df$english==1 & df$math==1)
However, I am trying to create a graph which looks like this graph; is this possible to do in R? I have tried using ggplot but am not sure how to create it?
the code for the dataframe is this:
structure(list(english = structure(c(2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
math = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L), .Label = c("0", "1"), class = "factor"), science = structure(c(1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), history = structure(c(2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
art = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), .Label = c("0", "1"), class = "factor"), geography = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("0",
"1"), class = "factor")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
One option to achieve your desired result would be via the widyr
package which makes it easy to compute the counts via widyr::pairwise_count
and returns the result in a tidy data format which could be easily plotted via ggplot2
:
- Add an identifier variable for the observations
- Convert your dataframe to long or tidy format using e.g.
tidyr::pivot_longer
- Filter your data and compute the counts
- Plot
library(widyr)
library(dplyr)
library(tidyr)
library(ggplot2)
dd <- d %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
filter(value == 1) %>%
pairwise_count(name, id)
ggplot(dd, aes(item1, item2))
geom_point(aes(size = n), color = "steelblue")
geom_text(aes(label = n), show.legend = FALSE)
scale_size_area(max_size = 10)
guides(size = "none")