Home > Mobile >  Is there a way to plot the instances that two variables appear in R?
Is there a way to plot the instances that two variables appear in R?

Time:10-31

I have a dataset which looks like this:

english math  science history art   geography
   <fct>   <fct> <fct>   <fct>   <fct> <fct>    
 1 1       1     0       1       1     0        
 2 0       0     0       1       0     1        
 3 1       0     1       0       0     1        
 4 0       1     0       1       1     0        
 5 1       1     0       0       0     0        
 6 1       1     1       0       1     1        
 7 1       1     0       0       1     1        
 8 1       1     0       0       0     1        
 9 0       0     0       1       0     0        
10 1       0     1       1       1     0        
11 1       0     0       1       1     0 

I am trying to count the instances across the whole dataframe that two variables appear, for example: both math and english both have the value of 1 for 5 instances.

I can count all the instances using this code:, and can do this for all the subjects

sum(df$english==1 & df$math==1)

However, I am trying to create a graph which looks like this graph; is this possible to do in R? I have tried using ggplot but am not sure how to create it?

the code for the dataframe is this:

structure(list(english = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 
2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), 
    math = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
    1L), .Label = c("0", "1"), class = "factor"), science = structure(c(1L, 
    1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L), .Label = c("0", 
    "1"), class = "factor"), history = structure(c(2L, 2L, 1L, 
    2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), 
    art = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 
    2L), .Label = c("0", "1"), class = "factor"), geography = structure(c(1L, 
    2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("0", 
    "1"), class = "factor")), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

One option to achieve your desired result would be via the widyr package which makes it easy to compute the counts via widyr::pairwise_count and returns the result in a tidy data format which could be easily plotted via ggplot2:

  1. Add an identifier variable for the observations
  2. Convert your dataframe to long or tidy format using e.g. tidyr::pivot_longer
  3. Filter your data and compute the counts
  4. Plot
library(widyr)
library(dplyr)
library(tidyr)
library(ggplot2)

dd <- d %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(-id) %>% 
  filter(value == 1) %>% 
  pairwise_count(name, id)

ggplot(dd, aes(item1, item2))  
  geom_point(aes(size = n), color = "steelblue")  
  geom_text(aes(label = n), show.legend = FALSE)  
  scale_size_area(max_size = 10)  
  guides(size = "none")

  • Related