I have this data frame with about 200 rows, and I need to combine the duplicate writers columns, and then find the mean value of their viewership. How can I accomplish this? Below is a sample of the data frame.
Viewership Writers
0 11.20 Ricky Gervais
1 11.20 Stephen Merchant
2 11.20 Greg Daniels
3 8.70 Greg Daniels
4 10.30 Mindy Kaling
.. ... ...
192 3.25 Halsted Sullivan
193 3.25 Warren Lieberstein
194 3.51 Niki Schwartz-Wright
195 4.56 Brent Forrester
196 5.69 Greg Daniels
[197 rows x 2 columns]
My solution was:
mean = df2.groupby(['Writers']).mean()
print(mean)
However, it still lists all the writers with duplicates, and does not combine their viewership into a mean value. The result I get is:
Viewership
Writers
Brent Forrester 7.560000
Gabe Miller 4.165000
Gene Stupnitsky 8.618333
Gene Stupnitsky 10.200000
Greg Daniels 9.200000
Halsted Sullivan 7.503333
Justin Spitzer 7.670000
Lee Eisenberg 7.867143
Lee Eisenberg 10.120000
Michael Schur 9.040000
Mindy Kaling 9.420000
Paul Lieberstein 7.395000
Stephen Merchant 9.070000
Stephen Merchant 11.200000
Warren Lieberstein 5.280000
Aaron Shure 7.173333
Allison Silverman 4.746667
Amelie Gillette 5.655000
Anthony Q. Farrell 8.315000
B. J. Novak 7.718182
Brent Forrester 7.348889
Brent Forrester 7.670000
Caroline Williams 8.840000
Steve Carell 7.945000
Steve Hely 6.073333
Tim McAuliffe 3.440000
Warren Lieberstein 7.503333
I did my best to preserve the indentations I get in my results. As you can see, some writers have a whitespace at the beginning of their name. I'm sure this may be causing my issue?
CodePudding user response:
Try:
df2['Writers'] = df2['Writers'].str.strip()
mean = df2.groupby(['Writers']).mean()
print(mean)
This should remove any whitespace issues before grouping