How to work with ' df.loc ' so that it combines two columns to return ' unique '-CodePudding

I ran into a problem when I tried to combine two columns before generating a unique list.

CSV file:

country,half,uniqueTournament
Brazil,1st half,Serie A
England,1st half,Championship
Argentina,2nd half,Primera Liga
Brazil,1st half,Serie A

My attempt:

import pandas as pd

csv_file = '@@@@@@@@@@@@@'
df = pd.read_csv(csv_file)

df.loc[(df['half'] == '1st half'), 'country'   ' - '   'uniqueTournament'].unique()

Expected outcome:

Brazil - Serie A
England - Championship

CodePudding user response：

You could create a new column, filter "1st half", then groupby agg(list):

df['new'] = df['country']   ' - '  df['uniqueTournament']
out = df[df['half']=='1st half'].drop_duplicates(subset=['half','new']).groupby('half')['new'].agg(list).iloc[0]

or you could use filter groupby unique:

out = df[df['half']=='1st half'].groupby('half')['new'].unique().iloc[0].tolist()

Output:

['Brazil - Serie A', 'England - Championship']