Home > OS >  Populate dataframe columns based on formula and count using columns from another dataframe
Populate dataframe columns based on formula and count using columns from another dataframe

Time:04-07

I have a dataframe of writers like this:

writer_id titles
n1 t1,t2
n2 t3,t4

And I have another dataframe for the titles with genre, ratings and votes like this:

title_id genres votes rating
t1 Drama,Action 100 7.0
t2 Action,Thriller 1000 8.0
t3 Crime ,Romance 200 6.0
t4 Drama,Romance 300 5.0

Now in the new data frame, I want to have a row for each writer with columns for each genre with the count value and another column (let's call it popularity) that will apply a formula using votes and rating. So it would look like this:

writer_id drama action thriller romance crime popularity
n1 1 2 1 0 0 2.2
n2 0 1 0 2 1 4.2

How should I go about doing this? I have the columns with genres already created.

CodePudding user response:

First part is DataFrame.explode by comma separated values in both DataFrames and merge, for counts use crosstab:

df11 = (df1.assign(title_id= df1['titles'].str.split(','))
           .explode('title_id')
           .drop('titles', 1))
df22 = (df2.assign(genres= df2['genres'].str.split(','))
           .explode('genres'))

df = df11.merge(df22, on='title_id') 

print (df)
  writer_id title_id    genres  votes  rating
0        n1       t1     Drama    100     7.0
1        n1       t1    Action    100     7.0
2        n1       t2    Action   1000     8.0
3        n1       t2  Thriller   1000     8.0
4        n2       t3     Crime    200     6.0
5        n2       t3   Romance    200     6.0
6        n2       t4     Drama    300     5.0
7        n2       t4   Romance    300     5.0

#count weight mean from df after `merge`
df4 = (df.assign(w = lambda x: x['votes'].mul(x['rating']))
         .groupby('writer_id')
         .sum()
         .assign(w = lambda x: x['w'].div(x['votes'])))

df3 = pd.crosstab(df['writer_id'], df['genres']).assign(popularity = df4['w'])
print (df3)
genres     Action  Crime  Drama  Romance  Thriller  popularity
writer_id                                                     
n1              2      0      1        0         1    7.909091
n2              0      1      1        2         0    5.400000
  • Related