Home > Back-end >  how to find the difference of dataframe which is given on start date and end date
how to find the difference of dataframe which is given on start date and end date

Time:12-27

I have to find the difference in data provided at 00:00:00 and 23:59:59 per day for seven days. How to find the difference in the data frame, which is given on the start date and end date?

Sample Data

Date                         Data
2018-12-01 00:00:00          2
2018-12-01 12:00:00          5
2018-12-01 23:59:59          10
2018-12-02 00:00:00          12
2018-12-02 12:00:00          15
2018-12-02 23:59:59          22

Expected Output

Date                        Data
2018-12-01                   8
2018-12-02                   10              

CodePudding user response:

Example

data = {
    'Date': ['2018-12-01 00:00:00', '2018-12-01 12:00:00', '2018-12-01 23:59:59',
             '2018-12-02 00:00:00', '2018-12-02 12:00:00', '2018-12-02 23:59:59'],
    'Data': [2, 5, 10, 12, 15, 22]
}
df = pd.DataFrame(data)

Code

df['Date'] = pd.to_datetime(df['Date'])
out = (df.resample('D', on='Date')['Data']
       .agg(lambda x: x.iloc[-1] - x.iloc[0]).reset_index())

out

    Date        Data
0   2018-12-01  8
1   2018-12-02  10

Update

more efficient way

you can get same result following code:

g = df.resample('D', on='Date')['Data']
out = g.last().sub(g.first()).reset_index()

CodePudding user response:

You can use groupby and iterate over with min-max range.

import pandas as pd

df = pd.DataFrame({
    'Date': ['2018-12-01 00:00:00', '2018-12-01 12:00:00', '2018-12-01 23:59:59', 
             '2018-12-02 00:00:00', '2018-12-02 12:00:00', '2018-12-02 23:59:59'],
    'Data': [2, 5, 10, 12, 15, 22]
})

df['Date'] = pd.to_datetime(df['Date'])

df['Date_Only'] = df['Date'].dt.date
result = df.groupby('Date_Only').apply(lambda x: x['Data'].max() - x['Data'].min())

print(result)
  • Related