Home > Mobile >  How to make a table function on a dataframe based on a series of subsets - and then visualise specif
How to make a table function on a dataframe based on a series of subsets - and then visualise specif

Time:10-21

I have these many datasets : df as the main data frame (but let's imagine all of them as very big datasets)

df = data.frame(x = seq(1,20,2),
y = c('a','a','b','c','a','a','b','c','a','a'),
z = c('d','e','e','d','f','e','e','d','e','f') )

g = data.frame(xx = c(2,3,4,5,7,8,9) )

h = data.frame(xx = c(3,5,7,8,9) )

i = data.frame(xx = c(2,3,6,8) )

j = data.frame(xx = c(1,3,6) )

And I wish to make a group of tables of frequency to the y column of df using the xx of each other dataframe each time (xx is used to subset df).

And then making a group of tables of frequency to the Z column of df using the xx of each other dataframe each time (xx is used to subset df).

Next :

I would like to visualise the frequencies of each value for one variable to study its developpement :

for example : for variable y : the developpement of the value a going from g to j is : 2 2 1 2. I would like to visualise this developpement for each value of variable y in a simple way.

CodePudding user response:

We could place the datasets in a list (dplyr::lst- returns a named list), loop over the list with map, subset the main dataset based on the 'x' column or do a inner_join and get the frequency count

library(dplyr)
library(purrr)
map(lst(g, h, i,j), 
   ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
       count(y, name = 'Count'))

-output

$g
  y Count
1 a     2
2 b     1
3 c     1

$h
  y Count
1 a     2
2 b     1
3 c     1

$i
  y Count
1 a     1

$j
  y Count
1 a     2

Or in base R

lapply(list(g = g, h = h, i = i, j = j),
  \(dat) subset(df, x %in% dat$xx, select = y ) |>
      table())

If we need to visualize, either convert to a single dataset and then do the barplot with geom_col/geom_bar or use barplot in base R

library(ggplot2)
map_dfr(lst(g, h, i,j), 
   ~ inner_join(df, .x, by = c("x" = "xx")) %>%      
       count(y, name = 'Count'), .id = 'grp') %>% 
  ggplot(aes(x = grp, y = Count, fill = y))  
    geom_col(position = "dodge")
  • Related