I have the following dataframe in Python:
ID | maths | value |
---|---|---|
0 | add | 12 |
1 | sub | 30 |
0 | add | 10 |
2 | mult | 3 |
0 | sub | 10 |
1 | add | 11 |
3 | sub | 40 |
2 | add | 21 |
My idea is to perform the following operations to get the result I want:
First step: Group the rows of the dataframe by ID. The order of the groups shall be indicated by the order of the original dataframe.
ID | maths | value |
---|---|---|
0 | add | 12 |
0 | add | 10 |
0 | sub | 10 |
1 | sub | 30 |
1 | add | 11 |
2 | mult | 3 |
2 | add | 21 |
3 | sub | 40 |
Second step: For each group created: Create a value for a new column 'result' where a mathematical operation indicated by the previous column of 'maths' is performed. If there is no previous row for the group, this column would have the value NaN.
ID | maths | value | result |
---|---|---|---|
0 | add | 12 | NaN |
0 | add | 10 | 22 |
0 | sub | 10 | 20 |
1 | sub | 30 | NaN |
1 | add | 11 | 19 |
2 | mult | 3 | NaN |
2 | add | 21 | 63 |
3 | sub | 40 | NaN |
Third step: Return the resulting dataframe.
I have tried to realise this code by making use of the pandas groupby method. But I have problems to iterate with conditions for each row and each group, and I don't know how to create the new column 'result' on a groupby object.
grouped_df = testing.groupby('ID')
for key, item in grouped_df:
print(grouped_df.get_group(key))
I don't know whether to use orderby or groupby or some other method that works for what I want to do. If you can help me with a better idea, I'd appreciate it.
CodePudding user response:
ID = list("00011223")
maths = ["add","add","sub","sub","add","mult","add","sub"]
value = [12,10,10,30,11,3,21,40]
import pandas as pd
df = pd.DataFrame(list(zip(ID,maths,value)),columns = ["ID","Maths","Value"])
df["Maths"] = df.groupby(["ID"]).pipe(lambda df:df.Maths.shift(1)).fillna("add")
df["Value1"] = df.groupby(["ID"]).pipe(lambda df:df.Value.shift(1))
df["result"] = df.groupby(["Maths"]).pipe(lambda x:(x.get_group("add")["Value1"] x.get_group("add")["Value"]).append(
x.get_group("sub")["Value1"] - x.get_group("sub")["Value"]).append(
x.get_group("mult")["Value1"] * x.get_group("mult")["Value"])).sort_index()
Here is the Output:
df
Out[168]:
ID Maths Value Value1 result
0 0 add 12 NaN NaN
1 0 add 10 12.0 22.0
2 0 add 10 10.0 20.0
3 1 add 30 NaN NaN
4 1 sub 11 30.0 19.0
5 2 add 3 NaN NaN
6 2 mult 21 3.0 63.0
7 3 add 40 NaN NaN