I ran into a problem when I tried to combine two columns before generating a unique list.
CSV file:
country,half,uniqueTournament
Brazil,1st half,Serie A
England,1st half,Championship
Argentina,2nd half,Primera Liga
Brazil,1st half,Serie A
My attempt:
import pandas as pd
csv_file = '@@@@@@@@@@@@@'
df = pd.read_csv(csv_file)
df.loc[(df['half'] == '1st half'), 'country' ' - ' 'uniqueTournament'].unique()
Expected outcome:
Brazil - Serie A
England - Championship
CodePudding user response:
You could create a new column, filter "1st half", then groupby
agg(list)
:
df['new'] = df['country'] ' - ' df['uniqueTournament']
out = df[df['half']=='1st half'].drop_duplicates(subset=['half','new']).groupby('half')['new'].agg(list).iloc[0]
or you could use filter groupby
unique
:
out = df[df['half']=='1st half'].groupby('half')['new'].unique().iloc[0].tolist()
Output:
['Brazil - Serie A', 'England - Championship']