import pandas as pd
import numpy as np
df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'],
'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom', 'Table Tennis,Chess,Chess,Chess', 'Carrom'] })
City Name Competition
0 Pune John Chess,Drawing,Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom
2 Pune John Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess
4 Pune Nicky Carrom
Required output
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom,Chess,Carrom Carrom,Chess
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
if equal number of words then add both words.Otherwise mmost frequent word
CodePudding user response:
First split values in columns with DataFrame.explode
, so possible get Series.mode
with join for all top values:
f = lambda x: ','.join(x.mode())
df['Most Frequent'] = (df.assign(Competition = df['Competition'].str.split(','))
.explode('Competition')
.groupby(level=0)['Competition']
.agg(f))
print (df)
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Carrom,Chess
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
CodePudding user response:
Use statistics.multimode
in apply
:
import pandas as pd
from statistics import multimode
df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'],
'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom',
'Table Tennis,Chess,Chess,Chess', 'Carrom']})
df["Most Frequent"] = df["Competition"].apply(lambda x: ",".join(multimode(x.split(","))[:2]))
print(df)
Output
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
CodePudding user response:
Here is a simple yet comprehensive solution using Counter
.
from collections import Counter
def keywithmaxval(d):
itemMaxValue = max(d.values())
return ','.join([k for k, v in d.items() if v == itemMaxValue])
df["Most Frequent"] = df['Competition'].str.split(',').apply(Counter).apply(keywithmaxval)
Output :
This gives us :
df
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom