How to separate entries, and count the occurrences-CodePudding

I'm trying to count which country most celebrities come from. However the csv that I'm working with has multiple countries for a single celeb. e.g. "France, US" for someone with a double nationality.

To count the above, I can use .count() for the entries in the "nationality" column. But, I want to count France, US and any other country separately.

I cannot figure out a way to separate all the entries in column and then, count the occurrences.

I want to be able to reorder my dataframe with these counts, so I want to count this inside the structure

data.groupby(by="nationality").count()

This returns some faulty counts of

"France, US" 1

CodePudding user response：

Assuming this type of data:

data = pd.DataFrame({'nationality': ['France','France, US', 'US', 'France']})

  nationality
0      France
1  France, US
2          US
3      France

You need to split and explode, then use value_counts to get the sorted counts per country:

out = (data['nationality']
       .str.split(', ')
       .explode()
       .value_counts()
      )

Output:

France    3
US        2
Name: nationality, dtype: int64