Compute mean of a groupbby pandas-CodePudding

I have a very large dataset about twitter. I want to be able to compute the mean tweets per hour published by the user. I was able to groupby the tweets per hour per user but now how can I compute the mean per hour?

I'm not able to write all the code since the dataset has been heavily preprocessed. In the dataset I have as column user_id and created_at which is a timestamp of the tweet published, so I sorted by created_at and than groupedby till hours

grouped_df = tweets_df.sort_values(["created_at"]).groupby([
    tweets_df['user_id'],
    tweets_df['created_at'].dt.year, 
    tweets_df['created_at'].dt.month,
    tweets_df['created_at'].dt.day,
    tweets_df['created_at'].dt.hour])

I can count the tweets per hours per user using

tweet_per_hour = grouped_df["created_at"].count()

print(tweet_per_hour)

what I obtain using this code is

user_id     created_at  created_at  created_at  created_at
678033      2012        3           11          2             1
                                                14            1
                                                17            1
                                                18            1
                        4           13          4             1
                                                             ..
3164941860  2020        4           30          7             6
                                                9             2
                        5           1           1             2
                                                9             6
                                    2           6             1
Name: created_at, Length: 3829888, dtype: int64

where the last column is the count of the tweets per hours

678033      2012        3           11          2             1

indicates that user the 678033 in the day 2012-03-11 in the range of hour between 2 o'clock and 3 o'clock made just 1 tweet.

I need to sum all the tweets per hour made by the user and compute a mean for that user So I want as output for example

user_id     average_tweets_per_hour
678033      4
665353      10

How can i do it?

CodePudding user response：

I'm not sure what the name of your columns are anymore, but it would be something like this:

grouped_df.reset_index().groupby("user_id").agg(avgTweetsPerHour = ('created_at','mean'))

As was commented above, I can't test this without enough information to reproduce it, but the .agg() goes beautifully with .groupby()

CodePudding user response：

You may be able to do something as simple as

df.groupby('A').mean()

But as the other responses have noted it's difficult to know exactly what to suggest without something reproducible.