I have a data frame that contains survey responses by country.
country=['Country A','Country A','Country A','Country B','Country B','Country B']
responses=['Agree','Neutral','Disagree','Agree','Neutral','Disagree']
num_respondents=[10,50,30,58,24,23]
example_df = pd.DataFrame({"Country": country, "Response": responses, "Count": num_respondents})
For each country, I want to compute the fraction (#Agree-#Disagree)/(Total Respondents). Is there a clean way to do this using groupby or another pandas function?
CodePudding user response:
Maybe it helps:
example_df.groupby('Country').apply(lambda x: (sum(x['Count'][x['Response'] == 'Agree'])
- sum(x['Count'][x['Response'] == 'Disagree']))
/sum(x['Count']))
CodePudding user response:
You could create a custom function and include your logic there:
import pandas as pd
def custom_agg(grp: pd.DataFrame) -> float:
"""Calculate the difference of agreement and disagreements in a group of responses.
Parameters
----------
grp : pd.DataFrame
A pandas DataFrame containing at least two columns: 'Response' and 'Count'.
Returns
-------
float
The diference between 'Agree' and 'Disagree' responses,
relative to the total number of responses,
calculated as: (total_agree - total_disagree) / total_count
Examples
--------
>>> country = ["Country A", "Country A", "Country A", "Country B",
... "Country B", "Country B"]
>>> responses = ["Agree", "Neutral", "Disagree", "Agree", "Neutral",
... "Disagree"]
>>> num_respondents = [10, 50, 30, 58, 24, 23]
>>> example_df = pd.DataFrame({"Country": country, "Response": responses,
... "Count": num_respondents})
>>> example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
"""
total_agree = grp[grp["Response"] == "Agree"]["Count"].sum()
total_disagree = grp[grp["Response"] == "Disagree"]["Count"].sum()
total_count = grp["Count"].sum()
return (total_agree - total_disagree) / total_count
example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
# Returns:
#
# Country
# Country A -0.222222
# Country B 0.333333
# dtype: float64
Defining a custom function is especially useful when you have to define complex logic to use inside group by/aggregate
scenarios.