f = pd.DataFrame({'Movie': ['name1','name2','name3']
'genre': [['comedy', 'action'];['comedy','scifi'];
['thriller','action']]
'distributor': ['disney', 'disney','parmount'})
#what if the genre has multiple values in it now name is both part of genre[0] and genre[1] if i use groupby
res = f[f['distributor'] == 'disney'].groupby(['genre'])
desired output
only want movie launched by disney
distributor genre count of movies
disney action 1
disney comedy 2
disney scifi 1
CodePudding user response:
Easy and clear to do with datar
that reimagines pandas' APIs:
>>> import pandas as pd
>>> df = pd.DataFrame({'Movie': ['name1','name2','name3'],
... 'genre': [['comedy', 'action'], ['comedy','scifi'],
... ['thriller','action']],
... 'distributor': ['disney', 'disney','parmount']})
>>> df
Movie genre distributor
0 name1 [comedy, action] disney
1 name2 [comedy, scifi] disney
2 name3 [thriller, action] paramount
>>>
>>> from datar.all import f, filter, unchop, count
[2022-03-31 11:47:44][datar][WARNING] Builtin name "filter" has been overriden by datar.
>>> (
... df
... >> filter(f.distributor == "disney")
... >> unchop(f.genre)
... >> count(f.distributor, f.genre)
... )
distributor genre n
<object> <object> <int64>
0 disney comedy 2
1 disney action 1
2 disney scifi 1
[TibbleGrouped: distributor (n=1)]
CodePudding user response:
Explode your list then count values:
out = df.loc[df['distributor'] == 'disney', 'genre'].explode().value_counts()
print(out)
# Output
comedy 2
action 1
scifi 1
Name: genre, dtype: int64
Update
out = (df.explode('genre').query("distributor == 'disney'")
.value_counts(['distributor', 'genre'], sort=False)
.rename('count').reset_index())
print(out)
# Output
distributor genre count
0 disney action 1
1 disney comedy 2
2 disney scifi 1
Update 2
It seems your genre
column does not contain a list but a strings. Try to convert this column as a list with ast.literal_eval
before using the code above:
import ast
df['genre'] = df['genre'].apply(ast.literal_eval)
# OR
df['genre'] = pd.eval(df['genre'])
# Execute now df.explode(...)...