table shows count and percentage count of cycle ride, grouped by membership type (casual, member).
What I have done in R:
big_frame %>%
group_by(member_casual) %>%
summarise(count = length(ride_id),
'%' = round((length(ride_id) / nrow(big_frame)) * 100, digit=2))
best I've come up with in Pandas, but I feel like there should be a better way:
member_casual_count = (
big_frame
.filter(['member_casual'])
.value_counts(normalize=True).mul(100).round(2)
.reset_index(name='percentage')
)
member_casual_count['count'] = (
big_frame
.filter(['member_casual'])
.value_counts()
.tolist()
)
member_casual_count
Thank you in advance
CodePudding user response:
In R, you should be doing something like this:
big_frame %>%
count(member_casual) %>%
mutate(perc = n/sum(n))
In python, you can achieve the same like this:
(
big_frame
.groupby("member_casual")
.size()
.to_frame('ct')
.assign(n = lambda df: df.ct/df.ct.sum())
)