Home > database >  Using dplyr in a for loop for multiple values
Using dplyr in a for loop for multiple values

Time:06-09

I have a dataframe (df) like this. I have 82 SKUs started from M1 to M82.

SKU date sales
M1  2-jan 4
M2  2-jan 5
M1  3-jan 8
M82 3-jan 1
...
M82 31-dec 9

i want to filter each SKU seperate and then group_by(date) and summarise(sales_perday = sum(sales) Something like this

for(i in SKU){
SKU_M[i] <- df %>% filter(SKU == SKU_M[i]) %>% group_by(date) 
%>% summarise(sales_perday = sum(sales))

Expected output are 82 dataframes with each SKU in 1 dataframe.

I did this below for 1 SKU but i want it for all 82 in an easy way.

M50 <- df %>% filter(SKU == 'M50') %>% group_by(date) %>% summarise(sales_perday = sum(sales))

CodePudding user response:

You probably want to group by multiple columns:

library(tidyverse)

data <- tribble(
  ~SKU, ~date, ~sales,
  "M1", "2-jan",4,
  "M2", "2-jan",5,
  "M1", "3-jan",8
)

# the cioncise way
data %>%
  group_by(SKU, date) %>%
  summarise(sales_perday = sum(sales))
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> # A tibble: 3 × 3
#> # Groups:   SKU [2]
#>   SKU   date  sales_perday
#>   <chr> <chr>        <dbl>
#> 1 M1    2-jan            4
#> 2 M1    3-jan            8
#> 3 M2    2-jan            5

# if one really want to have multiple data frames
data %>%
  group_by(SKU, date) %>%
  summarise(sales_perday = sum(sales)) %>%
  nest(-SKU) %>%
  pull(data)
#> Warning: All elements of `...` must be named.
#> Did you want `data = -SKU`?
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> [[1]]
#> # A tibble: 2 × 2
#>   date  sales_perday
#>   <chr>        <dbl>
#> 1 2-jan            4
#> 2 3-jan            8
#> 
#> [[2]]
#> # A tibble: 1 × 2
#>   date  sales_perday
#>   <chr>        <dbl>
#> 1 2-jan            5

Created on 2022-06-08 by the reprex package (v2.0.0)

CodePudding user response:

Another option with split:

df <- df |> 
  group_by(date) |> 
  summarise(sales_perday = sum(sales))

split(df, df$SKU)

CodePudding user response:

If you really do want separate data frames, then after grouping by SKU and date, and then summarizing, use group_split() to partition by SKU.

library(tidyverse)
df <- tribble(
  ~SKU, ~date, ~sales,
  "M1", "2-jan",4,
  "M2", "2-jan",5,
  "M1", "3-jan",8
)

df |> 
  group_by(SKU, date) |> 
  summarise(sales_perday = sum(sales)) |> 
  group_split()
#> `summarise()` has grouped output by 'SKU'. You can override using the `.groups`
#> argument.
#> <list_of<
#>   tbl_df<
#>     SKU         : character
#>     date        : character
#>     sales_perday: double
#>   >
#> >[2]>
#> [[1]]
#> # A tibble: 2 × 3
#>   SKU   date  sales_perday
#>   <chr> <chr>        <dbl>
#> 1 M1    2-jan            4
#> 2 M1    3-jan            8
#> 
#> [[2]]
#> # A tibble: 1 × 3
#>   SKU   date  sales_perday
#>   <chr> <chr>        <dbl>
#> 1 M2    2-jan            5
  • Related