Find top 3 producers who have produced movies with highest average ROI (Return on invesment)
Description: (I am being provided with a table)
import pandas as pd
import numpy as np
table = pd.DataFrame({'Movie_title':['Hot Tub Time Machine 2','The Princess Diaries 2: Royal Engagement','Whiplash','Kahaani','마린보이'],'Producers':[['Andrew Panay','Jason Blum'],['Whitney Houston', 'Mario Iscovich', 'Michel Litvak'],['David Lancaster', 'Michel Litvak', 'Jason Blum', 'Helen Estabrook'],['Sujoy Ghosh'],[]],'Directors':[['Steve Pink'],['Garry Marshall'],['Damien Chazelle'],['Sujoy Ghosh'],['Jong-seok Yoon']],'ROI':[-12.038207142857143,137.8735875,296.72727272727275,1233.3333333333333,-76.14607902735563]})
this is what the table dataframe looks like table DataFrame
I thought of applying '.groupby()' method on 'Producers' column and then using '.mean()' method on ROI column
table.groupby('Producers')[['Movie Title','ROI','Directors']].mean('ROI')
But its throwing an error (Please refer the following image)
Please Refer the image below to see the error. I dont know how to add jupyter notebook code outputs and pandas dataframe Please help me out with this problem statement. I provided the images of the code blocks from jupyter notebooks.
CodePudding user response:
You can use the pandas explode function on the Producers
column – for each element of the list in every row, you'll get a new row (with the data from the index and other columns duplicated).
table.explode("Producers")
Movie_title Producers Directors ROI
0 Hot Tub Time Machine 2 Andrew Panay [Steve Pink] -12.038207
0 Hot Tub Time Machine 2 Jason Blum [Steve Pink] -12.038207
1 The Princess Diaries 2: Royal Engagement Whitney Houston [Garry Marshall] 137.873588
1 The Princess Diaries 2: Royal Engagement Mario Iscovich [Garry Marshall] 137.873588
1 The Princess Diaries 2: Royal Engagement Michel Litvak [Garry Marshall] 137.873588
2 Whiplash David Lancaster [Damien Chazelle] 296.727273
2 Whiplash Michel Litvak [Damien Chazelle] 296.727273
2 Whiplash Jason Blum [Damien Chazelle] 296.727273
2 Whiplash Helen Estabrook [Damien Chazelle] 296.727273
3 Kahaani Sujoy Ghosh [Sujoy Ghosh] 1233.333333
4 마린보이 NaN [Jong-seok Yoon] -76.146079
Then you can apply your mean groupby (and combine it with explode for a one line solution):
table.explode("Producers").groupby("Producers").mean("ROI")
Final result:
ROI
Producers
Andrew Panay -12.038207
David Lancaster 296.727273
Helen Estabrook 296.727273
Jason Blum 142.344533
Mario Iscovich 137.873588
Michel Litvak 217.300430
Sujoy Ghosh 1233.333333
Whitney Houston 137.873588