Convert data frame to a list, with function from apply family-CodePudding

Like in title i need to converte some data frame

data1 <- data.frame(Year = rep(c(2016, 2017, 2018, 2019), each = 12), Month = rep(month.abb, 4), Expenses = sample(50e3:100e3, 48))

Create a list year_y in which each year (element) contains a data frame with expenses in each month. Then using list year_y create a list containing for each year(element) the month with biggest expenses. Here is what the final result should look like:

$‘2016‘
[1] "Jul"

$‘2017‘
[1] "Nov"

$‘2018‘
[1] "May"

$‘2018‘
[1] "May"

And the thing is i need to use apply function family in both steps

CodePudding user response：

In base R, we can use tapply

as.list(tapply(ata1$Expenses, ata1$Year, function(x) month.abb[which.max(x)]))
#> $`2016`
#> [1] "Jul"
#> 
#> $`2017`
#> [1] "Mar"
#>
#> $`2018`
#> [1] "Sep"
#> 
#> $`2019`
#> [1] "Dec"

CodePudding user response：

We group by 'Year', slice the row where the 'Expenses' is the max and then split the 'Month' by 'Year' column

library(dplyr)
data1 %>%
    group_by(Year) %>% 
    slice_max(n = 1, order_by = Expenses) %>%
    {split(.$Month, .$Year)}

Or another option is deframe

library(tibble)
data1 %>%
   group_by(Year) %>% 
   slice_max(n = 1, order_by = Expenses) %>%
   ungroup %>% 
   select(Year, Month) %>% 
   deframe() %>%
   as.list
$`2016`
[1] "Nov"

$`2017`
[1] "Dec"

$`2018`
[1] "Dec"

$`2019`
[1] "Mar"

Or with base R - subset the data where the 'Expenses' is the max value and split

with(subset(data1, Expenses == ave(Expenses, Year, FUN = max)), 
      split(Month, Year))

-output

$`2016`
[1] "Nov"

$`2017`
[1] "Dec"

$`2018`
[1] "Dec"

$`2019`
[1] "Mar"

CodePudding user response：

Using base R. Use the the split() function to divide the original data frame by year. Then use which.max() to determine which month has the highest expenses.

data1 <- data.frame(Year = rep(c(2016, 2017, 2018, 2019), each = 12), Month = rep(month.abb, 4), Expenses = sample(50e3:100e3, 48))

lapply(split(data1, ~Year), function(mon) {
   mon$Month[which.max(mon$Expenses)]
})

CodePudding user response：

Here is one more tidyverse approach which makes use of dplyr::pulls name argument.

library(dplyr)

data1 %>% 
  group_by(Year) %>% 
  filter(max(Expenses) == Expenses) %>% 
  pull(var = Month, name = Year) %>% 
  as.list()

#> $`2016`
#> [1] "Feb"
#> 
#> $`2017`
#> [1] "Apr"
#> 
#> $`2018`
#> [1] "Mar"
#> 
#> $`2019`
#> [1] "Dec"

^{Created on 2022-03-26 by the reprex package (v0.3.0)}

CodePudding user response：

Here is one more solution using purrr map_chr:

library(purrr)
library(dplyr)

data1 %>% 
  group_by(Year) %>% 
  arrange(desc(Expenses), .by_group = TRUE) %>% 
  slice(1) %>% 
  group_split() %>% 
  setNames(unique(data1$Year)) %>% 
  map_chr(., 2) %>% 
  as.list()

$`2016`
[1] "Apr"

$`2017`
[1] "Jan"

$`2018`
[1] "Mar"

$`2019`
[1] "Nov"