Im trying to apply —lapply— a function to create a column with the length of factors in several data frames inside a list.
Here's my example data:
> head(m.list)
$df.1
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
$df.2
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
I've manage to create groups using breaks:
year_cut <- function(m.list, col)
{cut(m.list[,col],
breaks=c(10,20, 30, 40, 50, 60, 100),
right = FALSE,
labels = c("A","B","C","D","E","F"))}
m.list = lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x,
col ="Years")))
>head(m.list)
$df.1
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
$df.2
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
Now I'm trying to get the length of groups, but I failed to do so.
I' ve tried two different approaches unsuccessfully:
cut_summary <- function(m.list, col)
{ summarize(
group_by(m.list,!!as.name(col)),
length(col)) }
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 436, 7
cut_summary <- function(m.list, col)
{ group_by(m.list,!!as.name(col)) %>% length(col)}
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in length(., col) :
2 arguments passed to 'length' which requires 1
Ideally, I should get:
>head(m.list)
$df.1
Date Years Group Total
56 1967-01-17 55 E 22
10 1981-07-27 40 D 32
34 1973-09-30 48 D 32
98 1944-03-17 78 F 4
27 1986-07-17 35 B 20
$df.2
Date Years Group Total
56 2005-01-17 17 A 22
10 1981-07-27 40 C 19
34 1973-09-30 48 E 3
98 1944-03-17 78 F 50
27 1986-07-17 35 B 4
Any help is most welcome. Thanks!
CodePudding user response:
We may create two columns with mutate/add_count
- loop over the list
with purrr::map
(or lapply
from base R), then mutate
to create the 'Group' column by applying the 'year_cut' on the 'Years' column, and use add_count
to create a count column
library(dplyr)
library(purrr)
map(m.list, ~ .x %>%
mutate(Group = year_cut(., col ="Years")) %>%
add_count(Group, name = "Total"))
-output
$df.1
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
The OP's function applies length
on a string input. Instead, it should be length(!!as.name(col))
or more easily it is n()
. Also, summarise
returns only the grouping columns and the summarised output column. Based on the expected output, it seems that the OP wants the full dataset or add a new column in the original dataset. In that case use mutate
cut_summary <- function(m.list, col)
{ mutate(group_by(m.list,!!as.name(col)), Total = n())}
and then calling the already modified m.list
with
m.list <- lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x, col ="Years")))
lapply(m.list, function(x) cut_summary(x, col = "Group"))
$df.1
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
data
m.list <- list(df.1 = structure(list(Date = c("1967-01-17", "1981-07-27",
"1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 40L,
48L, 78L, 35L)), class = "data.frame", row.names = c("56", "10",
"34", "98", "27")), df.2 = structure(list(Date = c("1967-01-17",
"1981-07-27", "1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L,
40L, 48L, 78L, 35L)), class = "data.frame", row.names = c("56",
"10", "34", "98", "27")))