Home > Software design >  Perform operations on a dataframe from groupings by ID
Perform operations on a dataframe from groupings by ID

Time:03-10

I have the following dataframe in Python:

ID maths value
0 add 12
1 sub 30
0 add 10
2 mult 3
0 sub 10
1 add 11
3 sub 40
2 add 21

My idea is to perform the following operations to get the result I want:

First step: Group the rows of the dataframe by ID. The order of the groups shall be indicated by the order of the original dataframe.

ID maths value
0 add 12
0 add 10
0 sub 10
1 sub 30
1 add 11
2 mult 3
2 add 21
3 sub 40

Second step: For each group created: Create a value for a new column 'result' where a mathematical operation indicated by the previous column of 'maths' is performed. If there is no previous row for the group, this column would have the value NaN.

ID maths value result
0 add 12 NaN
0 add 10 22
0 sub 10 20
1 sub 30 NaN
1 add 11 19
2 mult 3 NaN
2 add 21 63
3 sub 40 NaN

Third step: Return the resulting dataframe.

I have tried to realise this code by making use of the pandas groupby method. But I have problems to iterate with conditions for each row and each group, and I don't know how to create the new column 'result' on a groupby object.

grouped_df = testing.groupby('ID')
  for key, item in grouped_df:
    print(grouped_df.get_group(key))

I don't know whether to use orderby or groupby or some other method that works for what I want to do. If you can help me with a better idea, I'd appreciate it.

CodePudding user response:

ID = list("00011223")
maths = ["add","add","sub","sub","add","mult","add","sub"]
value = [12,10,10,30,11,3,21,40]
import pandas as pd
df = pd.DataFrame(list(zip(ID,maths,value)),columns = ["ID","Maths","Value"])
df["Maths"] = df.groupby(["ID"]).pipe(lambda df:df.Maths.shift(1)).fillna("add")
df["Value1"] = df.groupby(["ID"]).pipe(lambda df:df.Value.shift(1))   
df["result"] = df.groupby(["Maths"]).pipe(lambda x:(x.get_group("add")["Value1"]   x.get_group("add")["Value"]).append(
    x.get_group("sub")["Value1"] - x.get_group("sub")["Value"]).append(
        x.get_group("mult")["Value1"] * x.get_group("mult")["Value"])).sort_index()

Here is the Output:

df
Out[168]: 
  ID Maths  Value  Value1  result
0  0   add     12     NaN     NaN
1  0   add     10    12.0    22.0
2  0   add     10    10.0    20.0
3  1   add     30     NaN     NaN
4  1   sub     11    30.0    19.0
5  2   add      3     NaN     NaN
6  2  mult     21     3.0    63.0
7  3   add     40     NaN     NaN
  • Related