Home > Software engineering >  Count observations by group in a list of dataframes
Count observations by group in a list of dataframes

Time:12-21

Problem

I have a list of data frames. All of the dataframes have the same column names, but different numbers of rows. One column, called pred has the following four factors.

  1. Apple
  2. Cherry
  3. Orange
  4. Pear

I wish to count how many rows are 'apple', 'cherry' etc.

If I single out one dataframe (dataframe1), and perform the counts using:

count(dataframe1, pred)

I get the desired output:

pred                     n 
<fctr>                 <int>
Apple                   25          
Orange                  11          
Pear                    11          
Cherry                  12  

This is how I would like the output for mutliple dataframes that are contained within a list. How can this be achieved? I have tried various options using the dyplr package, but tend to get the error.

'Error in UseMethod("count") : no applicable method for 'count' applied to an object of class "list"'

CodePudding user response:

If all data.frame have the same columns you can use this:

library(dplyr)


data1 <-data.frame(pred = sample(c("Apple","Cherry","Orange","Pear"),100,replace = TRUE))
data2 <-data.frame(pred = sample(c("Apple","Cherry","Orange","Pear"),100,replace = TRUE))
data3 <-data.frame(pred = sample(c("Apple","Cherry","Orange","Pear"),100,replace = TRUE))


list_of_dataframes <- list(data1,data2,data3)


bind_rows(list_of_dataframes,.id = "data") %>% 
  count(data,pred)

CodePudding user response:

With lapply you may use table and coerce it as.data.frame.

lapply(dat, \(x) as.data.frame(table(x$pred, dnn='pred')))
# [[1]]
#     pred Freq
# 1  Apple    5
# 2 Cherry    2
# 3 Orange    3
# 
# [[2]]
#     pred Freq
# 1  Apple    5
# 2 Cherry    4
# 3 Orange    1
# 4   Pear    5
# 
# [[3]]
# p    red Freq
# 1  Apple    3
# 2 Cherry    5
# 3 Orange    7
# 4   Pear    3

Data:

set.seed(42)
dat <- lapply(c(10, 15, 18), \(x) 
              data.frame(
                pred=sample(c('Apple', 'Orange', 'Pear', 'Cherry'), x, replace=TRUE),
                x=runif(x))
              )
  • Related