I have two dataframes.
One is music.
name | Date | Edition | Song_ID | Singer_ID |
---|---|---|---|---|
LA | 01.05.2009 | 1 | 1 | 1 |
Second | 13.07.2009 | 1 | 2 | 2 |
Mexico | 13.07.2009 | 1 | 3 | 1 |
Let's go | 13.09.2009 | 1 | 4 | 3 |
Hello | 18.09.2009 | 1 | 5 | (4,5) |
Don't give up | 12.02.2010 | 2 | 6 | (5,6) |
ZIC ZAC | 18.03.2010 | 2 | 7 | 7 |
Blablabla | 14.04.2010 | 2 | 8 | 2 |
Oh la la | 14.05.2011 | 3 | 9 | 4 |
Food First | 14.05.2011 | 3 | 10 | 5 |
La Vie est.. | 17.06.2011 | 3 | 11 | 8 |
Jajajajajaja | 13.07.2011 | 3 | 12 | 9 |
And another dataframe called singer
Singer | nationality | Singer_ID |
---|---|---|
JT Watson | USA | 1 |
Rafinha | Brazil | 2 |
Juan Casa | Spain | 3 |
Kidi | USA | 4 |
Dede | USA | 5 |
Briana | USA | 6 |
Jay Ado | UK | 7 |
Dani | Australia | 8 |
Mike Rich | USA | 9 |
I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.
What I done so far is that
singer['nationality'].value_counts()['USA']
But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.
CodePudding user response:
You need to merge the two dataframes on the key shared https://pandas.pydata.org/docs/reference/api/pandas.merge.html
merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']
editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
nbr_usa = df["nationality"].value_counts()["USA"]
if nbr_usa > max_value:
best_edition = edition
max_value = nbr_usa