I want to extract how many positive reviews by brand are in a dataset which includes reviews from thousands of products. I used this code and I got a table including percentaje of positive and non-positive reviews. How can I get only the percentage of positive reviews by brand? I only want the "True" results in positive_review. Thanks!
df_reviews_ok.groupby("brand")["positive_review"].value_counts(normalize=True).mul(100).round(2)
brand positive_review
Belkin False 70.00
True 30.00
Bowers & Wilkins False 67.65
True 32.35
Corsair False 75.22
True 24.78
Definitive Technology False 68.29
True 31.71
Dell False 60.87
True 39.13
DreamWave False 100.00
House of Marley False 100.00
JBL False 58.43
True 41.57
Kicker True 66.67
False 33.33
Lenovo False 76.92
True 23.08
Logitech False 75.75
True 24.25
MEE audio False 53.80
True 46.20
Microsoft False 67.86
True 32.14
Midland False 72.09
True 27.91
Motorola False 72.92
True 27.08
Netgear False 72.30
True 27.70
Pny False 68.78
True 31.22
Power Acoustik False 100.00
SVS False 100.00
Samsung False 61.94
True 38.06
Sanus False 75.93
True 24.07
Sdi Technologies, Inc. False 55.63
True 44.37
Siriusxm False 73.33
True 26.67
Sling Media False 67.16
True 32.84
Sony False 55.40
True 44.60
Toshiba False 56.52
True 43.48
Ultimate Ears False 70.21
True 29.79
Verizon Wireless False 75.86
True 24.14
WD False 58.33
True 41.67
Yamaha False 61.15
True 38.85
Name: positive_review, dtype: float64
CodePudding user response:
Using the following toy DataFrame
as an example:
df = pd.DataFrame({
'brand': list('AAAABBBB'),
'positive': [True, True, False, False, True, True, True, False]
})
If you would like to get the percentage of positive reviews for each brand relative to the total number of reviews per brand then try:
df.groupby('brand')['positive'].mean()
The result is as expected:
brand
A 0.50
B 0.75
Name: positive, dtype: float64
CodePudding user response:
You can unstack
the output and slice the True
(df.groupby('brand')
['positive_review'].value_counts(normalize=True)
.mul(100).round(2)
.unstack(fill_value=0)
[True]
)
CodePudding user response:
How about using .reset_index()
after the statement and then using a condition.