I am trying to find a quick way to group records by a key word/string found in one column, and display summary statistics (.describe(), .mean(), etc.) for another column.
Below is a snippet of the code I was trying. I'm trying to avoid splitting/sub setting into a bunch of different dataframes.
My dataframe looks like the following:
PROMOTED_PRODUCT__CREATIVE ROAS
0 Simple Green 1 Gal. Concentrated 0.027573
1 Simple Green 1 Gal. Concentrated 0.082969
2 Simple Green 1 Gal. Concentrated 0.056278
3 Simple Green 1 Gal. Concentrated 0.037286
4 Simple Green 1 Gal. Concentrated 0.355843
df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains('1 gal', re.IGNORECASE).groupby(df_pi['PROMOTED_PRODUCT__CREATIVE'])['ROAS'].mean()
When I run the following code, it results in a key error:
KeyError: 'Column not found: ROAS'
I understand that it's because it's not a dataframe/there are no column headers. Should I do this in steps instead of a single line?
Any help would be greatly appreciated! Thank you so much in advance.
CodePudding user response:
If I understand you correctly, you can create a mask and filter the dataframe by this mask before doing .groupby
:
mask = df["PROMOTED_PRODUCT__CREATIVE"].str.contains(
"1 gal", flags=re.IGNORECASE
)
x = df[mask].groupby("PROMOTED_PRODUCT__CREATIVE")["ROAS"].mean()
print(x)
Prints:
PROMOTED_PRODUCT__CREATIVE
Simple Green 1 Gal. Concentrated 0.11199
Name: ROAS, dtype: float64