Home > Net >  Display Mean of Crosstab
Display Mean of Crosstab

Time:09-13

I have a dataframe 'df'. I want to display all rows of the 'Percentage of 1/Yes' column that are greater than the average of the column. My code displays whether it is True/False, but I want to display the actual value for the True rows and not display the False rows.

My code:

trainData, validData = train_test_split(df, test_size=0.4, random_state=1)

# Response rate for RFM categories
# RFM: Combine R, F, M categories into one category
trainData['RFM'] = trainData['Mcode'].astype(str)   trainData['Rcode'].astype(str)   trainData['Fcode'].astype(str)

rfm_crosstab = pd.crosstab(index = [trainData['RFM']], columns = trainData['Florence'], margins = True)
rfm_crosstab['Percentage of 1/Yes'] = 100 * (rfm_crosstab[1] / rfm_crosstab['All'])

# Display rows with percentage greater than mean
rfm_crosstab['Percentage of 1/Yes'] > rfm_crosstab['Percentage of 1/Yes'].mean()

Output:

RFM
111    False
121     True
131    False
141    False
211    False
212     True
221     True
222     True
231    False
232    False
241    False
242    False
311    False
312    False
313     True
321     True
322     True
323     True
331    False
332    False
333    False
341     True
342    False
343    False
411     True
412    False
413    False
421    False
422     True
423     True
431    False
432    False
433    False
441    False
442    False
443    False
511     True
512    False
513     True
521     True
522    False
523     True
531    False
532    False
533     True
541    False
542    False
543    False
All    False
Name: Percentage of 1/Yes, dtype: bool

Data: 'df'

Seq#    ID# Gender  M   R   F   FirstPurch  ChildBks    YouthBks    CookBks ... ItalCook    ItalAtlas   ItalArt Florence    Related Purchase    Mcode   Rcode   Fcode   Yes_Florence    No_Florence
0   1   25  1   297 14  2   22  0   1   1   ... 0   0   0   0   0   5   4   2   0   1
1   2   29  0   128 8   2   10  0   0   0   ... 0   0   0   0   0   4   3   2   0   1
2   3   46  1   138 22  7   56  2   1   2   ... 1   0   0   0   2   4   4   3   0   1
3   4   47  1   228 2   1   2   0   0   0   ... 0   0   0   0   0   5   1   1   0   1
4   5   51  1   257 10  1   10  0   0   0   ... 0   0   0   0   0   5   3   1   0   1

CodePudding user response:

Almost there, you can use your output (True/False column) the following way:

output = rfm_crosstab['Percentage of 1/Yes'] > rfm_crosstab['Percentage of 1/Yes'].mean()
rfm_crosstab['Percentage of 1/Yes'][output]
  • Related