Find and return column with max value from a list of vectors-CodePudding

I'm trying to calculate favorite brand per productcategory per customer.

1. Start and what I want to achieve

I start with a dataframe (mydata) with purchase orders, build up like this:

customer	ordernumber	productcategory	brand	amount
ABC	123456	H11_plumbing	Nitrofill	6
ABC	123457	H11_plumbing	Antileak	2
DEF	123458	H11_plumbing	Nitrofill	1
DEF	123459	H11_plumbing	Antileak	5
ABC	123460	H12_electric	Shock	10
ABC	123461	H12_electric	Lightning	5
DEF	123462	H12_electric	Shock	4
DEF	123463	H12_electric	Lightning	8

What I want to achieve is list per customer the favorite brand per productcategory.

customer	H11_plumbing_favorite_brand	H12_electric_favorite_brand
ABC	Nitrofill	Shock
DEF	Antileak	Lightning

For customer ABC Nitrofill (amount = 6) and Shock (amount = 10) are favorites

2. What I am doing now

What I now did was create a list of vectors for each productcategory and change the shape from long to wide using lapply to use data.table::dcast

df_list <- split(mydata, as.factor(mydata$productcategory)) # create list of vectors

library(data.table)
df_list_2 <- lapply(df_list,function(x) x <- data.table::dcast(setDT(x), customer ~ brand, sum, value.var = c("amount"))) # change shape from long to wide

3. Where I get stuck is finding and returning the column with the favorite brand

This is where I get stuck. I have been able to do this for a data.frame (vector) rather than a list of vectors by using this code:

mydata_t <- mydata[mydata$productcategory=="H12_electric",]
mydata_overview<- data.table::dcast(setDT(mydata_t), customer ~ brand, sum, value.var = c("amount"))
rm(mydata_t)
mydata_overview$favorite_brand <- apply(mydata_overview[,-c(1)],1,function(x) which(x==max(x)))

However, if I try to use this code on the list of vectors (df_list) then it doesn't work.

df_list_3 <- lapply(df_list,function(x) x$favorite_brand<- apply(x[,-c(1)],1,function(y) which(y==max(y))))
rm(df_list_t)

Any suggestions?

CodePudding user response：

Here is a solution using data.table:

> library(data.table)
> setDT(dat)
> dcast(dat[ , .(favourite=brand[which.max(amount)]), by=.(customer, productcategory) ], customer ~ productcategory)

   customer H11_plumbing H12_electric
1:      ABC    Nitrofill        Shock
2:      DEF     Antileak    Lightning

To break it down:

We can use data.table to find the brand corresponding to the maximum amount for each combination of customer and category as follows:

> dat[ , .(favourite=brand[which.max(amount)]), by=.(customer, productcategory) ]
   customer productcategory favourite
1:      ABC    H11_plumbing Nitrofill
2:      DEF    H11_plumbing  Antileak
3:      ABC    H12_electric     Shock
4:      DEF    H12_electric Lightning

Then use dcast to reshape this table with customer as the column and productcategory along the rows.

CodePudding user response：

Here is one approach. Create a favorite column that includes the brand where you have the maximum amount (for each customer and productcategory combination). Then, use dcast to put into wide form, using the new favorite category as your value. The unique is added to prevent duplicates, which would force the value to be the number of entries when aggregating.

library(data.table)

setDT(df)

df[, favorite := brand[which.max(amount)], by = .(customer, productcategory)][
  , dcast(unique(.SD, by = c("customer", "productcategory")), 
          customer ~ productcategory, 
          value.var = "favorite")
]

Output

   customer H11_plumbing H12_electric
1:      ABC    Nitrofill        Shock
2:      DEF     Antileak    Lightning

Data

df <- structure(list(customer = c("ABC", "ABC", "DEF", "DEF", "ABC", 
"ABC", "DEF", "DEF"), ordernumber = 123456:123463, productcategory = c("H11_plumbing", 
"H11_plumbing", "H11_plumbing", "H11_plumbing", "H12_electric", 
"H12_electric", "H12_electric", "H12_electric"), brand = c("Nitrofill", 
"Antileak", "Nitrofill", "Antileak", "Shock", "Lightning", "Shock", 
"Lightning"), amount = c(6L, 2L, 1L, 5L, 10L, 5L, 4L, 8L)), class = "data.frame", row.names = c(NA, 
-8L))