Home > Software engineering >  Create a new columns based on multiple conditions between two different dataframes with different di
Create a new columns based on multiple conditions between two different dataframes with different di

Time:12-14

I am trying to build a stock portfolio. I have mainly 2 dfs: 1 with my transactions and 1 with the stock prices of the individual stocks. My transactions df looks like this:

        Date    Ticker   Position
0   2022-11-01  MSFT     20
1   2022-11-15  PG       10
2   2022-11-25  JNJ      10
3   2022-11-22  MSFT     10

The position column indicates how many shares where purchased. So in the third row I bought a second position in MSFT, adding 10 stocks, taking my total to 30 stocks in MSFT.

My stock prices df looks like this (from yfinance):

    Ticker  Adj     Close
Date        
2022-11-01  MSFT    227.528793
2022-11-02  MSFT    219.481476
2022-11-03  MSFT    213.647903
2022-11-04  MSFT    220.767838
2022-11-07  MSFT    227.229630
... ... ...
2022-12-05  JNJ     178.779999
2022-12-06  JNJ     176.100006
2022-12-07  JNJ     177.169998
2022-12-08  JNJ     177.199997
2022-12-09  JNJ     175.740005

I would like to add a column to my stock prices df called Position which shows stock positions at that time.

I think this should not be so difficult with a double condition: If prices_date >= transaction_date AND transaction_ticker = prices_ticker THEN prices_postion = prices_position transaction_position

I thought of initially loading an new column with all zeros, which should should allow for a simple addition (or subtraction if shares were sold)

I tried resetting the index so that column to allow for better comparison, nested for loops, but only errors occurred. I cannot wrap my head around how to do it in Python nor have I found an answer online.

Any suggestions are appreciated

CodePudding user response:

Here's a solution that relies on merge, fillna, and groupby.cumsum():

import pandas as pd
import numpy as np

df1 = pd.DataFrame(
    {
        "Date": pd.to_datetime(
            ["2022-11-01", "2022-11-15", "2022-12-07", "2022-11-04"]
        ),
        "Ticker": ["MSFT", "JNJ", "JNJ", "MSFT"],
        "Position": [20, 10, 10, 10],
    }
)
df1

df2 = pd.DataFrame(
    {
        "Date": pd.to_datetime(
            [
                "2022-11-01",
                "2022-11-02",
                "2022-11-03",
                "2022-11-04",
                "2022-11-07",
                "2022-12-05",
                "2022-12-06",
                "2022-12-07",
                "2022-12-08",
                "2022-12-09",
            ]
        ),
        "Ticker": [
            "MSFT",
            "MSFT",
            "MSFT",
            "MSFT",
            "MSFT",
            "JNJ",
            "JNJ",
            "JNJ",
            "JNJ",
            "JNJ",
        ],
        "Close": [
            227.528793,
            219.481476,
            213.647903,
            220.767838,
            227.229630,
            178.779999,
            176.100006,
            177.169998,
            177.199997,
            175.740005,
        ],
    }
)
df3 = df2.merge(df1,how='outer',on=['Ticker','Date'])
df3 = df3.sort_values(by='Date')
df3 = df3.dropna(axis=0,subset='Close')
df3['Position'] = df3['Position'].fillna(0)
df3['Position'] = df3.groupby('Ticker')['Position'].cumsum()
df3

Result:

    Date    Ticker  Close   Position
0   2022-11-01  MSFT    227.528793  20.0
1   2022-11-02  MSFT    219.481476  20.0
2   2022-11-03  MSFT    213.647903  20.0
3   2022-11-04  MSFT    220.767838  30.0
4   2022-11-07  MSFT    227.22963   30.0
5   2022-12-05  JNJ     178.779999  0.0
6   2022-12-06  JNJ     176.100006  0.0
7   2022-12-07  JNJ     177.169998  10.0
8   2022-12-08  JNJ     177.199997  10.0
9   2022-12-09  JNJ     175.740005  10.0
  • Related