I am trying to build a stock portfolio. I have mainly 2 dfs: 1 with my transactions and 1 with the stock prices of the individual stocks. My transactions df looks like this:
Date Ticker Position
0 2022-11-01 MSFT 20
1 2022-11-15 PG 10
2 2022-11-25 JNJ 10
3 2022-11-22 MSFT 10
The position column indicates how many shares where purchased. So in the third row I bought a second position in MSFT, adding 10 stocks, taking my total to 30 stocks in MSFT.
My stock prices df looks like this (from yfinance):
Ticker Adj Close
Date
2022-11-01 MSFT 227.528793
2022-11-02 MSFT 219.481476
2022-11-03 MSFT 213.647903
2022-11-04 MSFT 220.767838
2022-11-07 MSFT 227.229630
... ... ...
2022-12-05 JNJ 178.779999
2022-12-06 JNJ 176.100006
2022-12-07 JNJ 177.169998
2022-12-08 JNJ 177.199997
2022-12-09 JNJ 175.740005
I would like to add a column to my stock prices df called Position which shows stock positions at that time.
I think this should not be so difficult with a double condition:
If prices_date >= transaction_date AND transaction_ticker = prices_ticker
THEN prices_postion = prices_position transaction_position
I thought of initially loading an new column with all zeros, which should should allow for a simple addition (or subtraction if shares were sold)
I tried resetting the index so that column to allow for better comparison, nested for loops, but only errors occurred. I cannot wrap my head around how to do it in Python nor have I found an answer online.
Any suggestions are appreciated
CodePudding user response:
Here's a solution that relies on merge
, fillna
, and groupby.cumsum()
:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
{
"Date": pd.to_datetime(
["2022-11-01", "2022-11-15", "2022-12-07", "2022-11-04"]
),
"Ticker": ["MSFT", "JNJ", "JNJ", "MSFT"],
"Position": [20, 10, 10, 10],
}
)
df1
df2 = pd.DataFrame(
{
"Date": pd.to_datetime(
[
"2022-11-01",
"2022-11-02",
"2022-11-03",
"2022-11-04",
"2022-11-07",
"2022-12-05",
"2022-12-06",
"2022-12-07",
"2022-12-08",
"2022-12-09",
]
),
"Ticker": [
"MSFT",
"MSFT",
"MSFT",
"MSFT",
"MSFT",
"JNJ",
"JNJ",
"JNJ",
"JNJ",
"JNJ",
],
"Close": [
227.528793,
219.481476,
213.647903,
220.767838,
227.229630,
178.779999,
176.100006,
177.169998,
177.199997,
175.740005,
],
}
)
df3 = df2.merge(df1,how='outer',on=['Ticker','Date'])
df3 = df3.sort_values(by='Date')
df3 = df3.dropna(axis=0,subset='Close')
df3['Position'] = df3['Position'].fillna(0)
df3['Position'] = df3.groupby('Ticker')['Position'].cumsum()
df3
Result:
Date Ticker Close Position
0 2022-11-01 MSFT 227.528793 20.0
1 2022-11-02 MSFT 219.481476 20.0
2 2022-11-03 MSFT 213.647903 20.0
3 2022-11-04 MSFT 220.767838 30.0
4 2022-11-07 MSFT 227.22963 30.0
5 2022-12-05 JNJ 178.779999 0.0
6 2022-12-06 JNJ 176.100006 0.0
7 2022-12-07 JNJ 177.169998 10.0
8 2022-12-08 JNJ 177.199997 10.0
9 2022-12-09 JNJ 175.740005 10.0