Home > OS >  Exclude a column from calculated value
Exclude a column from calculated value

Time:11-22

I'm new to the library and am trying to figure out how to add columns to a pivot table with the mean and standard deviation of the row data for the last three months of transaction data.

Here's the code that sets up the pivot table:

previousThreeMonths = [prev_month_for_analysis, prev_month2_for_analysis, prev_month3_for_analysis]
dfPreviousThreeMonths = df[df['Month'].isin(previousThreeMonths)]

ptHistoricalConsumption = dfPreviousThreeMonths.pivot_table(dfPreviousThreeMonths,
                                                            index=['Customer Part #'],
                                                            columns=['Month'],
                                                            aggfunc={'Qty Shp':np.sum}
                                                            )

ptHistoricalConsumption['Mean'] = ptHistoricalConsumption.mean(numeric_only=True, axis=1)
ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.std(numeric_only=True, axis=1)
ptHistoricalConsumption

The resulting pivot table looks like this: pivot table

The problem is that the standard deviation column is including the Mean in its calculations, whereas I just want it to use the raw data for the previous three months. For example, the Std Dev of part number 2225 should be 11.269, not 9.2.

I'm sure there's a better way to do this and I'm just missing something.

CodePudding user response:

One way would be to remove the Mean column temporarily before call .std():

ptHistoricalConsumption['Std Dev'] = ptHistoricalConsumption.drop('Mean', axis=1).std(numeric_only=True, axis=1)

That wouldn't remove it from the permanently, it would just remove it from the copy fed to .std().

  • Related