Home > front end >  groupby to display time per day per person pandas
groupby to display time per day per person pandas

Time:03-06

I'm trying to filter this dataframe by id, timestamp and my third column is the time diff between entries. I can get it to display the total sum per id for all days but can't make it work to display sum per day per id.

import datetime
import pandas as pd
timestamps = [
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 5, 12, 0, 0, 0)  # person 3
]
df1 = pd.DataFrame({'person': [1, 2, 1, 3, 2, 1, 3, 2], 'timestamp': timestamps}) 
df1['new'] = df1.groupby('person').timestamp.transform(pd.Series.diff).dropna()
                               
df1.groupby('person')['timestamp','new'].sum()

This just gives me the total, not per day. How do I combine them per day?

CodePudding user response:

You can just include the date part of the "timestamp" column in your groupby condition like this:

>>> df1.groupby(["person", df1.timestamp.dt.date])["new"].sum()

Also, if you prefer, you could create a new column with the date from the timestamp and then group by that column:

>>> df1["date"] = df1["timestamp"].dt.date
>>> df1.groupby(["person", "date"])["new"].sum()

Optionally, you can .reset_index() at the end to contain your group values in new columns.

  • Related