Home > OS >  Convert data frame to a list, with function from apply family
Convert data frame to a list, with function from apply family

Time:03-27

Like in title i need to converte some data frame

data1 <- data.frame(Year = rep(c(2016, 2017, 2018, 2019), each = 12), Month = rep(month.abb, 4), Expenses = sample(50e3:100e3, 48))

Create a list year_y in which each year (element) contains a data frame with expenses in each month. Then using list year_y create a list containing for each year(element) the month with biggest expenses. Here is what the final result should look like:

$‘2016‘
[1] "Jul"

$‘2017‘
[1] "Nov"

$‘2018‘
[1] "May"

$‘2018‘
[1] "May"

And the thing is i need to use apply function family in both steps

CodePudding user response:

In base R, we can use tapply

as.list(tapply(ata1$Expenses, ata1$Year, function(x) month.abb[which.max(x)]))
#> $`2016`
#> [1] "Jul"
#> 
#> $`2017`
#> [1] "Mar"
#>
#> $`2018`
#> [1] "Sep"
#> 
#> $`2019`
#> [1] "Dec"

CodePudding user response:

We group by 'Year', slice the row where the 'Expenses' is the max and then split the 'Month' by 'Year' column

library(dplyr)
data1 %>%
    group_by(Year) %>% 
    slice_max(n = 1, order_by = Expenses) %>%
    {split(.$Month, .$Year)}

Or another option is deframe

library(tibble)
data1 %>%
   group_by(Year) %>% 
   slice_max(n = 1, order_by = Expenses) %>%
   ungroup %>% 
   select(Year, Month) %>% 
   deframe() %>%
   as.list
$`2016`
[1] "Nov"

$`2017`
[1] "Dec"

$`2018`
[1] "Dec"

$`2019`
[1] "Mar"

Or with base R - subset the data where the 'Expenses' is the max value and split

with(subset(data1, Expenses == ave(Expenses, Year, FUN = max)), 
      split(Month, Year))

-output

$`2016`
[1] "Nov"

$`2017`
[1] "Dec"

$`2018`
[1] "Dec"

$`2019`
[1] "Mar"

CodePudding user response:

Using base R. Use the the split() function to divide the original data frame by year. Then use which.max() to determine which month has the highest expenses.

data1 <- data.frame(Year = rep(c(2016, 2017, 2018, 2019), each = 12), Month = rep(month.abb, 4), Expenses = sample(50e3:100e3, 48))

lapply(split(data1, ~Year), function(mon) {
   mon$Month[which.max(mon$Expenses)]
})

CodePudding user response:

Here is one more tidyverse approach which makes use of dplyr::pulls name argument.

library(dplyr)

data1 %>% 
  group_by(Year) %>% 
  filter(max(Expenses) == Expenses) %>% 
  pull(var = Month, name = Year) %>% 
  as.list()

#> $`2016`
#> [1] "Feb"
#> 
#> $`2017`
#> [1] "Apr"
#> 
#> $`2018`
#> [1] "Mar"
#> 
#> $`2019`
#> [1] "Dec"

Created on 2022-03-26 by the reprex package (v0.3.0)

CodePudding user response:

Here is one more solution using purrr map_chr:

library(purrr)
library(dplyr)

data1 %>% 
  group_by(Year) %>% 
  arrange(desc(Expenses), .by_group = TRUE) %>% 
  slice(1) %>% 
  group_split() %>% 
  setNames(unique(data1$Year)) %>% 
  map_chr(., 2) %>% 
  as.list()
$`2016`
[1] "Apr"

$`2017`
[1] "Jan"

$`2018`
[1] "Mar"

$`2019`
[1] "Nov"
  • Related