I have a DataFrame
df1
with ID
and Amount
on specific Dates
. I try to sum up the Amount
of two specific rows which have the same ID
value.
df1:
Date ID Amount
0 2022-01-02 1200 10.0
1 2022-01-02 1200 1.0
2 2022-01-02 1400 12.0
3 2022-01-02 1500 11.0
4 2022-01-03 1300 12.5
5 2022-01-03 1300 0.5
6 2022-01-03 1500 12.0
This would be the desired output:
df1:
Date ID Amount
0 2022-01-02 1200 11 <-- 10 1
1 2022-01-02 1200 0 <-- -1
2 2022-01-02 1400 12
3 2022-01-02 1500 11
4 2022-01-03 1300 13 <-- 12.5 0.5
5 2022-01-03 1300 0 <-- -0.5
6 2022-01-03 1500 12
I tried to do it with an np.where()
to replace the Amount
where the shifted
ID
value is equal to the ID
value.
For reproducibility:
import pandas as pd
df1 = pd.DataFrame({
'Date':['2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03'],
'ID':[1200, 1200, 1400, 1500, 1300, 1300, 1500],
'Amount':[10, 1, 12, 11, 12.5, 0.5, 12]})
Many thanks for your help!
CodePudding user response:
If I understand your problem correctly, it looks like a transaction data and the groups you need are by [Date, ID].
If so, then you can achieve it as:
df1["Amount"] = df1.groupby(["Date", "ID"])["Amount"].transform(lambda x: [x.sum() if i==0 else 0 for i,_ in enumerate(x)])
Full example. I have added some additional data the end of the dataset just to test the corner condition with more than 2 entries:
import pandas as pd
df1 = pd.DataFrame({
'Date':['2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-04', '2022-01-04', '2022-01-04'],
'ID':[1200, 1200, 1400, 1500, 1300, 1300, 1500, 1500, 1500, 1500],
'Amount':[10, 1, 12, 11, 12.5, 0.5, 12, 10, 3, 5]})
df1["Amount"] = df1.groupby(["Date", "ID"])["Amount"].transform(lambda x: [x.sum() if i==0 else 0 for i,_ in enumerate(x)])
print(df1)
[Out]:
Date ID Amount
0 2022-01-02 1200 11.0
1 2022-01-02 1200 0.0
2 2022-01-02 1400 12.0
3 2022-01-02 1500 11.0
4 2022-01-03 1300 13.0
5 2022-01-03 1300 0.0
6 2022-01-03 1500 12.0
7 2022-01-04 1500 18.0
8 2022-01-04 1500 0.0
9 2022-01-04 1500 0.0
CodePudding user response:
Let's try the following code:
Amount=[None]*len(df1)
for i in range(1, len(df1)):
if df1['ID'][i] == df1['ID'][i-1]:
Amount[i] = df1['Amount'][i] - df1['Amount'][i]
Amount[i-1] = df1['Amount'][i] df1['Amount'][i-1]
else:
Amount[i] = df1['Amount'][i]
df1['Amount']=Amount
Output
>>> df1
Date ID Amount
0 2022-01-02 1200 11.0
1 2022-01-02 1200 0.0
2 2022-01-02 1400 12.0
3 2022-01-02 1500 11.0
4 2022-01-03 1300 13.0
5 2022-01-03 1300 0.0
6 2022-01-03 1500 12.0