Home > Blockchain >  Groupby custom function based on other column values
Groupby custom function based on other column values

Time:01-11

I have a data frame that contains survey responses by country.

country=['Country A','Country A','Country A','Country B','Country B','Country B']
responses=['Agree','Neutral','Disagree','Agree','Neutral','Disagree']
num_respondents=[10,50,30,58,24,23]
example_df = pd.DataFrame({"Country": country, "Response": responses, "Count": num_respondents})

For each country, I want to compute the fraction (#Agree-#Disagree)/(Total Respondents). Is there a clean way to do this using groupby or another pandas function?

CodePudding user response:

Maybe it helps:

example_df.groupby('Country').apply(lambda x: (sum(x['Count'][x['Response'] == 'Agree']) 
                                            - sum(x['Count'][x['Response'] == 'Disagree'])) 
                                              /sum(x['Count']))

CodePudding user response:

You could create a custom function and include your logic there:

import pandas as pd

def custom_agg(grp: pd.DataFrame) -> float:
    """Calculate the difference of agreement and disagreements in a group of responses.

    Parameters
    ----------
    grp : pd.DataFrame
        A pandas DataFrame containing at least two columns: 'Response' and 'Count'.

    Returns
    -------
    float
        The diference between 'Agree' and 'Disagree' responses,
        relative to the total number of responses,
        calculated as: (total_agree - total_disagree) / total_count

    Examples
    --------
    >>> country = ["Country A", "Country A", "Country A", "Country B",
    ...            "Country B", "Country B"]
    >>> responses = ["Agree", "Neutral", "Disagree", "Agree", "Neutral",
    ...             "Disagree"]
    >>> num_respondents = [10, 50, 30, 58, 24, 23]
    >>> example_df = pd.DataFrame({"Country": country, "Response": responses,
    ...                            "Count": num_respondents})
    >>> example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
    """
    total_agree = grp[grp["Response"] == "Agree"]["Count"].sum()
    total_disagree = grp[grp["Response"] == "Disagree"]["Count"].sum()
    total_count = grp["Count"].sum()
    return (total_agree - total_disagree) / total_count


example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
# Returns:
#
# Country
# Country A   -0.222222
# Country B    0.333333
# dtype: float64

Defining a custom function is especially useful when you have to define complex logic to use inside group by/aggregate scenarios.

  • Related