Populate dataframe columns based on formula and count using columns from another dataframe-CodePudding

I have a dataframe of writers like this:

writer_id	titles
n1	t1,t2
n2	t3,t4

And I have another dataframe for the titles with genre, ratings and votes like this:

title_id	genres	votes	rating
t1	Drama,Action	100	7.0
t2	Action,Thriller	1000	8.0
t3	Crime ,Romance	200	6.0
t4	Drama,Romance	300	5.0

Now in the new data frame, I want to have a row for each writer with columns for each genre with the count value and another column (let's call it popularity) that will apply a formula using votes and rating. So it would look like this:

writer_id	drama	action	thriller	romance	crime	popularity
n1	1	2	1	0	0	2.2
n2	0	1	0	2	1	4.2

How should I go about doing this? I have the columns with genres already created.

CodePudding user response：

First part is DataFrame.explode by comma separated values in both DataFrames and merge, for counts use crosstab:

df11 = (df1.assign(title_id= df1['titles'].str.split(','))
           .explode('title_id')
           .drop('titles', 1))
df22 = (df2.assign(genres= df2['genres'].str.split(','))
           .explode('genres'))

df = df11.merge(df22, on='title_id') 

print (df)
  writer_id title_id    genres  votes  rating
0        n1       t1     Drama    100     7.0
1        n1       t1    Action    100     7.0
2        n1       t2    Action   1000     8.0
3        n1       t2  Thriller   1000     8.0
4        n2       t3     Crime    200     6.0
5        n2       t3   Romance    200     6.0
6        n2       t4     Drama    300     5.0
7        n2       t4   Romance    300     5.0

#count weight mean from df after `merge`
df4 = (df.assign(w = lambda x: x['votes'].mul(x['rating']))
         .groupby('writer_id')
         .sum()
         .assign(w = lambda x: x['w'].div(x['votes'])))

df3 = pd.crosstab(df['writer_id'], df['genres']).assign(popularity = df4['w'])
print (df3)
genres     Action  Crime  Drama  Romance  Thriller  popularity
writer_id                                                     
n1              2      0      1        0         1    7.909091
n2              0      1      1        2         0    5.400000