I have a csv file from which I get the data and do the grouping. Which looks something like this
Time | Operation | Count |
---|---|---|
10:00:00 | Up | 40 |
10:00:00 | Down | 24 |
10:00:01 | Up | 4 |
10:00:01 | Down | 54 |
10:00:02 | Down | 22 |
10:00:03 | Up | 12 |
10:00:03 | Down | 11 |
To do this, I use
df = pd.read_csv(Ex_Csv, usecols=['Time','Count','Operation'], parse_dates=[0])
df['Time'] = df['Time'].dt.floor('S', 0).dt.time
df2 = df.groupby(['Operation', 'Time']).sum()
After I do the subtraction
out = df2.loc['Up']-df2.loc['Down']
I expected that if the values for example 'up' as at 10:00:02 did not come that it would be equal to 0 and I would get 0 - 22 and I get this
Time | Count |
---|---|
10:00:00 | 16 |
10:00:01 | -50 |
10:00:02 | -22 |
10:00:03 | 1 |
But I get this
Time | Count |
---|---|
10:00:00 | 16 |
10:00:01 | -50 |
10:00:02 | |
10:00:03 | 1 |
Is it possible to somehow equate the value of 'up' or 'down' to zero if it didn 't come ?
CodePudding user response:
Try to pivot your dataframe then fill null values by 0 then compute the diff:
out = (df.pivot('Time', 'Operation', 'Count').fillna(0).diff(1, axis=1)['Up']
.rename('Count').reset_index())
print(out)
# Output
Time Count
0 10:00:00 16.0
1 10:00:01 -50.0
2 10:00:02 -22.0
3 10:00:03 1.0
Before the diff
, your dataframe looks like:
>>> df.pivot('Time', 'Operation', 'Count').fillna(0)
Operation Down Up
Time
10:00:00 24.0 40.0
10:00:01 54.0 4.0
10:00:02 22.0 0.0
10:00:03 11.0 12.0
Safe way:
out = df.pivot('Time', 'Operation', 'Count').fillna(0)
out = pd.Series(out['Up']-out['Down'], index=out.index, name='Count').reset_index()
print(out)
# Output
Time Count
0 10:00:00 16.0
1 10:00:01 -50.0
2 10:00:02 -22.0
3 10:00:03 1.0
CodePudding user response:
import pandas as pd
from io import StringIO
data = StringIO("""Time;Operation;Count
10:00:00;Up;40
10:00:00;Down;24
10:00:01;Up;4
10:00:01;Down;54
10:00:02;Down;22
10:00:03;Up;12
10:00:03;Down;11
""")
df = pd.read_csv(data, sep=';')
pd.Time = pd.to_datetime(df.Time).dt.time
df.groupby(['Operation', 'Time']).sum()
df2 = pd.pivot(df, index='Time', columns='Operation', values='Count').fillna(0).astype(int)
df2.Up - df2.Down
Output:
Time
10:00:00 16
10:00:01 -50
10:00:02 -22
10:00:03 1
dtype: int64