I have a df with customer and country data. I want to count the countries so can find the top5 countries, and them use that as a filter elsewhere.
this gives me the counts
countries = collections.Counter(responses_2021['country'].dropna())
that yields this
[('US', 144), ('CA', 37), ('GB', 15), ('FR', 15), ('AU', 12)]
and this gives me the top 5
countries_top5 = countries.most_common(5)
now I need to transform it into a more simple structure so I can do my filter (here i'm just typing it manually because that's the only way I could move forward lol)
options = ['US', 'CA', 'GB', 'FR', 'AU']
rslt_df = df[df['country'].isin(options)]
So, to get from the this
[('US', 144), ('CA', 37), ('GB', 15), ('FR', 15), ('AU', 12)]
to this
['US', 'CA', 'GB', 'FR', 'AU']
I started by trying to remove the counts
countries_top5_names = np.delete(countries_top5, 1, 1)
but that yields
[['US'], ['CA'], ['GB'], ['FR'], ['AU']]
so now I'm trying to flatten that, but I don't know how.
better way?
SOLUTION (thanks to @dan04 below)
countries_top5_names = [x[0] for x in countries_top5]
rslt_df = df[df['country'].isin(countries_top5_names)]
CodePudding user response:
Just take element [0]
of each tuple.
>>> data = [('US', 144), ('CA', 37), ('GB', 15), ('FR', 15), ('AU', 12)]
>>> countries = [x[0] for x in data]
>>> countries
['US', 'CA', 'GB', 'FR', 'AU']
CodePudding user response:
You can try more universal method to do this.
data = [('US', 144), ('CA', 37), ('GB', 15), ('FR', 15), ('AU', 12)]
groups = list(zip(*data))
print(groups[0])
print(groups[1])
Output:
('US', 'CA', 'GB', 'FR', 'AU')
(144, 37, 15, 15, 12)