New to Pandas/Python (student). I have what should be a simple problem but every approach I try fails.
Dataset has "country" column and "indicator" column. Countries appear >1 time. Indicator col tells us who is pro-vaccine ("Vac_plan" and "Vac_done") and who is not (as well as other info). I simply want a total for each country based on the count of who is pro-vaccine for that respective country., e.g.,
Ethiopia 7
Nigeria 5
My latest failed attempts are below:
vaccines_by_country=df.groupby('country')['indicator'=='Vac_plan|Vac_done'].count()
and...
df.groupby(['country']).str.contains('Vac_plan|Vac_done').count()
TIA for your merciful help.
CodePudding user response:
You're quite close in your second attempt; you just need to reverse the order of actions. First find the strings, then group:
df['indicator'].str.contains('Vac_plan|Vac_done').groupby(df['country']).sum()