Home > other >  R: Pivot_Wider/spread by obtaining average sorted by year
R: Pivot_Wider/spread by obtaining average sorted by year

Time:01-15

I've the following dataset

Pet Shop Year Item Price
A 2021 dog 300
A 2021 dog 250
A 2021 fish 20
A 2020 turtle 50
A 2020 dog 250
A 2020 cat 280
A 2019 rabbit 180
A 2019 cat 165
A 2019 cat 270
B 2021 dog 350
B 2021 fish 80
B 2021 fish 70
B 2020 cat 220
B 2020 turtle 90
B 2020 turtle 80
B 2020 fish 55
B 2019 fish 75
C 2021 dog 280
C 2020 cat 260
C 2020 cat 270
C 2019 fish 65
C 2019 cat 270

The code for the data is as follows

Pet_Shop = c(rep("A",9), rep("B",8), rep("C",5))
Item = c("Dog","Dog","Fish","Turtle","Dog","Cat","Rabbit","Cat","Cat","Dog","Fish","Fish","Cat","Turtle","Turtle","Fish","Fish","Dog","Cat","Cat","Fish","Cat")
Price = c(300,250,20,50,250,280,180,165,270,350,80,70,220,90,80,55,75,280,260,270,65,270)
Data = data.frame(Pet_Shop, Item, Price)

Does anyone here know how I can use pivot_wider or spread (or any other method) to achieve the following table? It groups the Shop by year and takes the average of the similar item of the same shop for the year. I've issues incorporating the year.

Pet Shop Year dog fish turtle cat rabbit
A 2021 Average(300,250) = 275 20 NA NA NA
A 2020 250 NA 50 280 NA
A 2019 NA NA NA 217.5 NA
B 2021 350 75 NA NA NA
B 2020 NA 55 85 220 NA
B 2019 NA 75 NA NA NA
C 2021 280 NA NA NA NA
C 2020 NA NA NA 265 NA
C 2019 NA 60 NA 270 NA

CodePudding user response:

In pivot_wider you may pass a function (values_fn) to be applied to each combination of Pet_Shop and Year.

result <- tidyr::pivot_wider(Data, names_from = Item, 
                             values_from = Price, values_fn = mean)
result

#  Pet_Shop  Year   dog  fish turtle   cat rabbit
#  <chr>    <int> <dbl> <dbl>  <dbl> <dbl>  <dbl>
#1 A         2021   275    20     NA   NA      NA
#2 A         2020   250    NA     50  280      NA
#3 A         2019    NA    NA     NA  218.    180
#4 B         2021   350    75     NA   NA      NA
#5 B         2020    NA    55     85  220      NA
#6 B         2019    NA    75     NA   NA      NA
#7 C         2021   280    NA     NA   NA      NA
#8 C         2020    NA    NA     NA  265      NA
#9 C         2019    NA    65     NA  270      NA

The same can also be done with data.table dcast -

library(data.table)

dcast(setDT(Data), Pet_Shop   Year ~ Item, 
      value.var = "Price", fun.aggregate = mean)
  •  Tags:  
  • Related