Home > front end >  Getting length of factors inside a list of data frames [R]
Getting length of factors inside a list of data frames [R]

Time:05-13

Im trying to apply —lapply— a function to create a column with the length of factors in several data frames inside a list.

Here's my example data:

> head(m.list)
$df.1
         Date Years
56 1967-01-17  55  
10 1981-07-27  40  
34 1973-09-30  48  
98 1944-03-17  78  
27 1986-07-17  35  

$df.2
         Date Years
56 1967-01-17  55  
10 1981-07-27  40  
34 1973-09-30  48  
98 1944-03-17  78  
27 1986-07-17  35  

I've manage to create groups using breaks:

year_cut <- function(m.list, col)
      {cut(m.list[,col],
       breaks=c(10,20, 30, 40, 50, 60, 100),
       right = FALSE,
       labels = c("A","B","C","D","E","F"))}

m.list = lapply(m.list, function(x)
                cbind(x, "Group" = year_cut(m.list = x,
                      col ="Years")))
>head(m.list)    
$df.1
         Date Years Group
56 1967-01-17  55   E
10 1981-07-27  40   D
34 1973-09-30  48   D
98 1944-03-17  78   F
27 1986-07-17  35   B

$df.2
         Date Years Group
56 1967-01-17  55   E
10 1981-07-27  40   D
34 1973-09-30  48   D
98 1944-03-17  78   F
27 1986-07-17  35   B

Now I'm trying to get the length of groups, but I failed to do so.

I' ve tried two different approaches unsuccessfully:

cut_summary <- function(m.list, col)
{ summarize(
  group_by(m.list,!!as.name(col)),
  length(col)) }
    
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
 col ="Group"))) 
    
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 436, 7
    
cut_summary <- function(m.list, col)
{ group_by(m.list,!!as.name(col)) %>% length(col)}
    
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
      col ="Group")))

Error in length(., col) :
2 arguments passed to 'length' which requires 1

Ideally, I should get:

>head(m.list)    
$df.1
         Date Years Group Total
56 1967-01-17  55   E      22
10 1981-07-27  40   D      32
34 1973-09-30  48   D      32
98 1944-03-17  78   F      4
27 1986-07-17  35   B      20

$df.2
         Date Years Group Total
56 2005-01-17  17   A      22
10 1981-07-27  40   C      19
34 1973-09-30  48   E      3
98 1944-03-17  78   F      50
27 1986-07-17  35   B      4

Any help is most welcome. Thanks!

CodePudding user response:

We may create two columns with mutate/add_count - loop over the list with purrr::map (or lapply from base R), then mutate to create the 'Group' column by applying the 'year_cut' on the 'Years' column, and use add_count to create a count column

library(dplyr)
library(purrr)
map(m.list,  ~ .x %>%
               mutate(Group = year_cut(., col ="Years")) %>%
               add_count(Group, name = "Total"))

-output

$df.1
        Date Years Group Total
1 1967-01-17    55     E     1
2 1981-07-27    40     D     2
3 1973-09-30    48     D     2
4 1944-03-17    78     F     1
5 1986-07-17    35     C     1

$df.2
        Date Years Group Total
1 1967-01-17    55     E     1
2 1981-07-27    40     D     2
3 1973-09-30    48     D     2
4 1944-03-17    78     F     1
5 1986-07-17    35     C     1

The OP's function applies length on a string input. Instead, it should be length(!!as.name(col)) or more easily it is n(). Also, summarise returns only the grouping columns and the summarised output column. Based on the expected output, it seems that the OP wants the full dataset or add a new column in the original dataset. In that case use mutate

cut_summary <- function(m.list, col)
  { mutate(group_by(m.list,!!as.name(col)), Total = n())}

and then calling the already modified m.list with

m.list <- lapply(m.list, function(x)
                 cbind(x, "Group" = year_cut(m.list = x, col ="Years")))
lapply(m.list, function(x) cut_summary(x, col = "Group"))
$df.1
# A tibble: 5 × 4
# Groups:   Group [4]
  Date       Years Group Total
  <chr>      <int> <fct> <int>
1 1967-01-17    55 E         1
2 1981-07-27    40 D         2
3 1973-09-30    48 D         2
4 1944-03-17    78 F         1
5 1986-07-17    35 C         1

$df.2
# A tibble: 5 × 4
# Groups:   Group [4]
  Date       Years Group Total
  <chr>      <int> <fct> <int>
1 1967-01-17    55 E         1
2 1981-07-27    40 D         2
3 1973-09-30    48 D         2
4 1944-03-17    78 F         1
5 1986-07-17    35 C         1

data

m.list <- list(df.1 = structure(list(Date = c("1967-01-17", "1981-07-27", 
"1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 40L, 
48L, 78L, 35L)), class = "data.frame", row.names = c("56", "10", 
"34", "98", "27")), df.2 = structure(list(Date = c("1967-01-17", 
"1981-07-27", "1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 
40L, 48L, 78L, 35L)), class = "data.frame", row.names = c("56", 
"10", "34", "98", "27")))
  • Related