Home > other >  Group by key words found in one column, then show summary statistics for another column python
Group by key words found in one column, then show summary statistics for another column python

Time:10-05

I am trying to find a quick way to group records by a key word/string found in one column, and display summary statistics (.describe(), .mean(), etc.) for another column.

Below is a snippet of the code I was trying. I'm trying to avoid splitting/sub setting into a bunch of different dataframes.

My dataframe looks like the following:

PROMOTED_PRODUCT__CREATIVE              ROAS
0   Simple Green 1 Gal. Concentrated    0.027573
1   Simple Green 1 Gal. Concentrated    0.082969
2   Simple Green 1 Gal. Concentrated    0.056278
3   Simple Green 1 Gal. Concentrated    0.037286
4   Simple Green 1 Gal. Concentrated    0.355843


df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains('1 gal', re.IGNORECASE).groupby(df_pi['PROMOTED_PRODUCT__CREATIVE'])['ROAS'].mean()

When I run the following code, it results in a key error:

KeyError: 'Column not found: ROAS'

I understand that it's because it's not a dataframe/there are no column headers. Should I do this in steps instead of a single line?

Any help would be greatly appreciated! Thank you so much in advance.

CodePudding user response:

If I understand you correctly, you can create a mask and filter the dataframe by this mask before doing .groupby:

mask = df["PROMOTED_PRODUCT__CREATIVE"].str.contains(
    "1 gal", flags=re.IGNORECASE
)

x = df[mask].groupby("PROMOTED_PRODUCT__CREATIVE")["ROAS"].mean()

print(x)

Prints:

PROMOTED_PRODUCT__CREATIVE
Simple Green 1 Gal. Concentrated    0.11199
Name: ROAS, dtype: float64
  • Related