How to add zero values to a grouping for subsequent normal subtraction Python Pandas-CodePudding

I have a csv file from which I get the data and do the grouping. Which looks something like this

Time	Operation	Count
10:00:00	Up	40
10:00:00	Down	24
10:00:01	Up	4
10:00:01	Down	54
10:00:02	Down	22
10:00:03	Up	12
10:00:03	Down	11

To do this, I use

df = pd.read_csv(Ex_Csv, usecols=['Time','Count','Operation'], parse_dates=[0])
df['Time'] = df['Time'].dt.floor('S', 0).dt.time
df2 = df.groupby(['Operation', 'Time']).sum()

After I do the subtraction

out = df2.loc['Up']-df2.loc['Down']

I expected that if the values for example 'up' as at 10:00:02 did not come that it would be equal to 0 and I would get 0 - 22 and I get this

Time	Count
10:00:00	16
10:00:01	-50
10:00:02	-22
10:00:03	1

But I get this

Time	Count
10:00:00	16
10:00:01	-50
10:00:02
10:00:03	1

Is it possible to somehow equate the value of 'up' or 'down' to zero if it didn 't come ?

CodePudding user response：

Try to pivot your dataframe then fill null values by 0 then compute the diff:

out = (df.pivot('Time', 'Operation', 'Count').fillna(0).diff(1, axis=1)['Up']
         .rename('Count').reset_index())
print(out)

# Output
       Time  Count
0  10:00:00   16.0
1  10:00:01  -50.0
2  10:00:02  -22.0
3  10:00:03    1.0

Before the diff, your dataframe looks like:

>>> df.pivot('Time', 'Operation', 'Count').fillna(0)
Operation  Down    Up
Time                 
10:00:00   24.0  40.0
10:00:01   54.0   4.0
10:00:02   22.0   0.0
10:00:03   11.0  12.0

Safe way:

out = df.pivot('Time', 'Operation', 'Count').fillna(0)
out = pd.Series(out['Up']-out['Down'], index=out.index, name='Count').reset_index()
print(out)

# Output
       Time  Count
0  10:00:00   16.0
1  10:00:01  -50.0
2  10:00:02  -22.0
3  10:00:03    1.0

CodePudding user response：

import pandas as pd
from io import StringIO

data = StringIO("""Time;Operation;Count
10:00:00;Up;40
10:00:00;Down;24
10:00:01;Up;4
10:00:01;Down;54
10:00:02;Down;22
10:00:03;Up;12
10:00:03;Down;11
""")

df = pd.read_csv(data, sep=';')
pd.Time = pd.to_datetime(df.Time).dt.time
df.groupby(['Operation', 'Time']).sum()

df2 = pd.pivot(df, index='Time', columns='Operation', values='Count').fillna(0).astype(int)
df2.Up - df2.Down

Output:

Time
10:00:00    16
10:00:01   -50
10:00:02   -22
10:00:03     1
dtype: int64