I have to find the difference in data provided at 00:00:00 and 23:59:59 per day for seven days. How to find the difference in the data frame, which is given on the start date and end date?
Sample Data
Date Data
2018-12-01 00:00:00 2
2018-12-01 12:00:00 5
2018-12-01 23:59:59 10
2018-12-02 00:00:00 12
2018-12-02 12:00:00 15
2018-12-02 23:59:59 22
Expected Output
Date Data
2018-12-01 8
2018-12-02 10
CodePudding user response:
Example
data = {
'Date': ['2018-12-01 00:00:00', '2018-12-01 12:00:00', '2018-12-01 23:59:59',
'2018-12-02 00:00:00', '2018-12-02 12:00:00', '2018-12-02 23:59:59'],
'Data': [2, 5, 10, 12, 15, 22]
}
df = pd.DataFrame(data)
Code
df['Date'] = pd.to_datetime(df['Date'])
out = (df.resample('D', on='Date')['Data']
.agg(lambda x: x.iloc[-1] - x.iloc[0]).reset_index())
out
Date Data
0 2018-12-01 8
1 2018-12-02 10
Update
more efficient way
you can get same result following code:
g = df.resample('D', on='Date')['Data']
out = g.last().sub(g.first()).reset_index()
CodePudding user response:
You can use groupby and iterate over with min-max range.
import pandas as pd
df = pd.DataFrame({
'Date': ['2018-12-01 00:00:00', '2018-12-01 12:00:00', '2018-12-01 23:59:59',
'2018-12-02 00:00:00', '2018-12-02 12:00:00', '2018-12-02 23:59:59'],
'Data': [2, 5, 10, 12, 15, 22]
})
df['Date'] = pd.to_datetime(df['Date'])
df['Date_Only'] = df['Date'].dt.date
result = df.groupby('Date_Only').apply(lambda x: x['Data'].max() - x['Data'].min())
print(result)