I have a csv file that looks like
Time | OpenIA |
---|---|
2022-07-15 10:00:23 | 1 |
2022-07-15 10:01:11 | 3 |
2022-07-15 10:01:11 | 2 |
2022-07-15 10:01:11 | 1 |
2022-07-15 10:01:11 | 3 |
2022-07-15 10:01:11 | 1 |
2022-07-15 10:01:33 | 1 |
2022-07-15 10:01:33 | 2 |
I'm trying to subtract the latter from the first value with the same identifier so that it would eventually turn out something like
Time | OpenIA |
---|---|
2022-07-15 10:00:23 | 0 |
2022-07-15 10:01:11 | 2 |
2022-07-15 10:01:33 | -1 |
To do this, I use this
df = pd.read_csv(DF, usecols=['Time', 'OpenIA'])
df['Time'] = pd.to_datetime(df['Time'])
df['Time'] = df['Time'].dt.ceil("S", 0)
b = df.drop_duplicates(subset=['Time'], keep='last') - df.drop_duplicates(subset=['Time'], keep='first')
But instead of the expected I get
Time | OpenIA |
---|---|
0 days | 0.0 |
0 days | 0.0 |
0 days | 0.0 |
CodePudding user response:
You can use groupby
.first
/last
:
g = df.groupby('Time', sort=False)
out = (g.first()-g.last()).reset_index()
output:
Time OpenIA
0 2022-07-15 10:00:23 0
1 2022-07-15 10:01:11 2
2 2022-07-15 10:01:33 -1
CodePudding user response:
try this
df.groupby('Time').agg(diff=('OpenIA', lambda x: x[-1]-x[0]) )