Home > Software design >  Lambda function operations using groupby.apply() in dataframe
Lambda function operations using groupby.apply() in dataframe

Time:01-18

I have a DataFrame that looks like this:

enter image description here

The code to build it is:

data = {'Position Date': ['2022-01-02',  '2022-01-02',  '2022-01-02',  '2022-01-02',  '2022-01-03',  '2022-01-03',  '2022-01-03',  '2022-01-03'],
 'Client': ['Client 1',  'Client 1',  'Client 2',  'Client 2',  'Client 1',  'Client 1',  'Client 2',  'Client 2'],
 'Product': ['Product 1',  'Product 4',  'Product 2',  'Product 3',  Product 1',  'Product 4',  'Product 2',  'Product 3'],
 'Buy Date': ['2022-05-02',  '2022-06-02',  '2022-03-12',  '2022-01-25',  '2022-05-02',  '2022-06-02',  '2022-03-12',  '2022-01-25'],
 'Position': [130, 5000, 120, 77, 150, 7000, 200, 100]}

df = pd.DataFrame(data).set_index(['Position Date', 'Client', 'Product', 'Buy Date'])

df['PL'] = df.groupby(level=['Client', 'Product', 'Buy Date']).diff().fillna(0)

So, now I need to create a new column X that divides a the "current day" PL (index 0) by the last day Position (with index -1).

For example: on the day 2022-01-03, Client 1, product 1 the x would be:

X = PL[index 0] / Position[-1] = 20/ 130

The expected output would be:

enter image description here

As the first day's values are 0 and the others are: 20/130, 2000/5000, 80/120, 23/77

I was trying something like

df.groupby(level=['Client', 'Product', 'Buy Date']).apply(lambda x: x['PL'] / x['Position'].iloc[-1])

But I keep getting errors.

CodePudding user response:

I would use indexing here:

df['output'] = df['PL'].div(df.loc[df.index[0][0], 'Position']
                              .reindex(df.droplevel('Position Date').index).values
                           )

Output:

                                             Position      PL    output
Position Date Client   Product   Buy Date                              
2022-01-02    Client 1 Product 1 2022-05-02       130     0.0  0.000000
                       Product 4 2022-06-02      5000     0.0  0.000000
              Client 2 Product 2 2022-03-12       120     0.0  0.000000
                       Product 3 2022-01-25        77     0.0  0.000000
2022-01-03    Client 1 Product 1 2022-05-02       150    20.0  0.153846
                       Product 4 2022-06-02      7000  2000.0  0.400000
              Client 2 Product 2 2022-03-12       200    80.0  0.666667
                       Product 3 2022-01-25       100    23.0  0.298701

Intermediates:

df.loc[df.index[0][0], 'Position']

Client    Product    Buy Date  
Client 1  Product 1  2022-05-02     130
          Product 4  2022-06-02    5000
Client 2  Product 2  2022-03-12     120
          Product 3  2022-01-25      77
Name: Position, dtype: int64

(df.loc[df.index[0][0], 'Position']
   .reindex(df.droplevel('Position Date').index)
)

Client    Product    Buy Date  
Client 1  Product 1  2022-05-02     130
          Product 4  2022-06-02    5000
Client 2  Product 2  2022-03-12     120
          Product 3  2022-01-25      77
Client 1  Product 1  2022-05-02     130
          Product 4  2022-06-02    5000
Client 2  Product 2  2022-03-12     120
          Product 3  2022-01-25      77
Name: Position, dtype: int64

(df.loc[df.index[0][0], 'Position']
   .reindex(df.droplevel('Position Date').index)
   .values
)

array([ 130, 5000,  120,   77,  130, 5000,  120,   77])

CodePudding user response:

df["X"] = df["PL"].div(df["Position"].shift(4)).fillna(0.0).values

where 4 in the shift method is the total number of products (Product 1, Product 2, Product 3 and Product 4) in a day in the Position Date column

  • Related