I have a DataFrame that looks like this:
The code to build it is:
data = {'Position Date': ['2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03'],
'Client': ['Client 1', 'Client 1', 'Client 2', 'Client 2', 'Client 1', 'Client 1', 'Client 2', 'Client 2'],
'Product': ['Product 1', 'Product 4', 'Product 2', 'Product 3', Product 1', 'Product 4', 'Product 2', 'Product 3'],
'Buy Date': ['2022-05-02', '2022-06-02', '2022-03-12', '2022-01-25', '2022-05-02', '2022-06-02', '2022-03-12', '2022-01-25'],
'Position': [130, 5000, 120, 77, 150, 7000, 200, 100]}
df = pd.DataFrame(data).set_index(['Position Date', 'Client', 'Product', 'Buy Date'])
df['PL'] = df.groupby(level=['Client', 'Product', 'Buy Date']).diff().fillna(0)
So, now I need to create a new column X that divides a the "current day" PL (index 0) by the last day Position (with index -1).
For example: on the day 2022-01-03, Client 1, product 1 the x would be:
X = PL[index 0] / Position[-1] = 20/ 130
The expected output would be:
As the first day's values are 0 and the others are: 20/130, 2000/5000, 80/120, 23/77
I was trying something like
df.groupby(level=['Client', 'Product', 'Buy Date']).apply(lambda x: x['PL'] / x['Position'].iloc[-1])
But I keep getting errors.
CodePudding user response:
I would use indexing here:
df['output'] = df['PL'].div(df.loc[df.index[0][0], 'Position']
.reindex(df.droplevel('Position Date').index).values
)
Output:
Position PL output
Position Date Client Product Buy Date
2022-01-02 Client 1 Product 1 2022-05-02 130 0.0 0.000000
Product 4 2022-06-02 5000 0.0 0.000000
Client 2 Product 2 2022-03-12 120 0.0 0.000000
Product 3 2022-01-25 77 0.0 0.000000
2022-01-03 Client 1 Product 1 2022-05-02 150 20.0 0.153846
Product 4 2022-06-02 7000 2000.0 0.400000
Client 2 Product 2 2022-03-12 200 80.0 0.666667
Product 3 2022-01-25 100 23.0 0.298701
Intermediates:
df.loc[df.index[0][0], 'Position']
Client Product Buy Date
Client 1 Product 1 2022-05-02 130
Product 4 2022-06-02 5000
Client 2 Product 2 2022-03-12 120
Product 3 2022-01-25 77
Name: Position, dtype: int64
(df.loc[df.index[0][0], 'Position']
.reindex(df.droplevel('Position Date').index)
)
Client Product Buy Date
Client 1 Product 1 2022-05-02 130
Product 4 2022-06-02 5000
Client 2 Product 2 2022-03-12 120
Product 3 2022-01-25 77
Client 1 Product 1 2022-05-02 130
Product 4 2022-06-02 5000
Client 2 Product 2 2022-03-12 120
Product 3 2022-01-25 77
Name: Position, dtype: int64
(df.loc[df.index[0][0], 'Position']
.reindex(df.droplevel('Position Date').index)
.values
)
array([ 130, 5000, 120, 77, 130, 5000, 120, 77])
CodePudding user response:
df["X"] = df["PL"].div(df["Position"].shift(4)).fillna(0.0).values
where 4 in the shift
method is the total number of products (Product 1, Product 2, Product 3 and Product 4) in a day in the Position Date
column