Data:
Date column1 column2 column3 column4
2021-08-20 19 30 11 8
2021-08-21 15 25 11 4
2021-08-22 5 10 5 0
2021-08-23 25 36 16 9
2021-08-24 6 6 6 0
I want to iterate 2 rows at a time and create a new column it is like backlog from previous day today : In every second row of each iteration I want a value like: df['new_column'] = (df['column2']-df['column4']) from row 2 (df['column2']-df['column4']) from row1
I am trying this:
from itertools import tee
from itertools import zip_longest as izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
for (idx1, row1), (idx2, row2) in pairwise(df.iterrows()):
print(idx1,row1,"\n\n",idx2,row2,"\n\n")
df['Backlog_today'][row2] = (df.loc[row2, ['column2']] - df.loc[row2, ['column4']])
df['Backlog_yesterday'][row1] = (df.loc[row1, ['column2']] - df.loc[row1, ['column4']])
df['new_column'] = df['Backlog_today'] df['Backlog_yesterday']
How can I correct this?
CodePudding user response:
Use rolling
to compute the sum between today and yesterday then aggregate
by subtracting the two columns.
df['new_column'] = df.rolling(2, min_periods=1)[['column2', 'column4']].sum() \
.agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
print(df)
# Output:
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 22.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
CodePudding user response:
Subtract values and then add shifted Series by Series.add
and Series.shift
:
s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift(), fill_value=0)
print (df)
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 22.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
If need first value 0
:
s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift()).fillna(0)
print (df)
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 0.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
Performance in sample data:
#5k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [141]: %%timeit
...: df.rolling(2)[['column2', 'column4']].sum().agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
...:
46.8 ms ± 274 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [142]: %%timeit
...: s = (df['column2']-df['column4'])
...: s.add(s.shift(), fill_value=0)
...:
...:
460 µs ± 4.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [143]: %%timeit
...: s = (df['column2']-df['column4'])
...: s.add(s.shift()).fillna(0)
...:
...:
496 µs ± 8.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)