I have two Pandas dataframes, df_stock_prices and df_sentiment_mean.
I would like to do the following:
Left join/merge these two dataframes into one dataframe, joined by Date and by ticker. In df_stock_prices, ticker is the column name, for example AAPL.OQ and in df_sentiment_mean ticker is found within the rows of the column named ticker.
If there is a Date and ticker from df_stock_prices that doesn't match df_sentiment_mean, keep the non-matching row of df_stock_prices as-is (hence the left join).
When there is a match for both Date and ticker, multiply the fields together; for example in the dataframes listed below, if df_stock_prices Date is 2021-11-29 and column AAPL.OQ is a match for the df_sentiment_mean Date of 2021-11-29 and ticker AAPL.OQ, then multiply the values for the match, in this example: 160.24 * 0.163266.
If a Date and ticker from df_stock_prices doesn't match a Date and ticker value from df_sentiment_mean, keep the values from df_stock_prices.
Current dataframes:
df_stock_prices:
AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N AMD.OQ AMGN.OQ
Date
2021-11-29 160.24 116.89 128.03 365.82 687.49 54.95 161.91 203.47
2021-11-30 165.30 115.28 125.77 357.40 669.85 52.60 158.37 198.88
2021-12-01 164.77 115.91 126.74 360.14 657.41 51.72 149.11 200.80
2021-12-02 163.76 116.87 128.38 365.30 671.88 53.96 150.68 201.17
2021-12-03 161.84 118.85 130.27 361.42 616.53 53.32 144.01 202.44
...
df_sentiment_mean:
ticker diff
Date
2021-11-29 AAPL.OQ 0.163266
2021-11-29 ABBV.N -0.165520
2021-11-29 ABT.N 0.149920
2021-11-29 ADBE.OQ -0.014639
2021-11-29 AIG.N -0.448595
... ... ...
2023-01-12 LOW.N 0.008863
2023-01-12 MDT.N 0.498884
2023-01-12 MO.N -0.013428
2023-01-12 NEE.N 0.255223
2023-01-12 NKE.N 0.072752
Desired dataframe, partial first row example:
df_new:
AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N …
Date
2021-11-29 26.16174384 -19.3476328 19.1942576 365.82 -100.6416611 -0.80441305 …
...
CodePudding user response:
You have to reshape your second dataframe then multiply both dataframes. Finally, fill nan by original values of your first dataframe:
# df1 = df_stock_prices
# df2 = df_sentiment_mean
df_new = df1.mul(df2.set_index('ticker', append=True)['diff'].unstack('ticker')).fillna(df1)
>>> df1
AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N
Date
2021-11-29 160.24 116.89 128.03 365.82 687.49 54.95
>>> df2
ticker diff
Date
2021-11-29 AAPL.OQ 0.163266
2021-11-29 ABBV.N -0.165520
2021-11-29 ABT.N 0.149920
2021-11-29 ADBE.OQ -0.014639
2021-11-29 AIG.N -0.448595
>>> df_new
AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N
Date
2021-11-29 26.161744 -19.347633 19.194258 365.82 -10.064166 -24.650295
CodePudding user response:
Here's another way to solve your problem:
# Necessary imports
import numpy as np
import pandas as pd
# Create a dataframe with the stock prices
df_stock_prices = pd.DataFrame(
[
["2021-11-29", 160.24, 116.89, 128.03, 365.82, 687.49, 54.95, 161.91, 203.47],
["2021-11-30", 165.30, 115.28, 125.77, 357.40, 669.85, 52.60, 158.37, 198.88],
["2021-12-01", 164.77, 115.91, 126.74, 360.14, 657.41, 51.72, 149.11, 200.80],
["2021-12-02", 163.76, 116.87, 128.38, 365.30, 671.88, 53.96, 150.68, 201.17],
["2021-12-03", 161.84, 118.85, 130.27, 361.42, 616.53, 53.32, 144.01, 202.44],
],
columns=["Date", "AAPL.OQ", "ABBV.N", "ABT.N", "ACN.N", "ADBE.OQ", "AIG.N",
"AMD.OQ", "AMGN.OQ"],
)
# Create a dataframe with the sentiment mean
df_sentiment_mean = pd.DataFrame(
[
["2021-11-29", "AAPL.OQ", 0.163266],
["2021-11-29", "ABBV.N", -0.165520],
["2021-11-29", "ABT.N", 0.149920],
["2021-11-29", "ADBE.OQ", -0.014639],
["2021-11-29", "AIG.N", -0.448595],
["2023-01-12", "LOW.N", 0.008863],
["2023-01-12", "MDT.N", 0.498884],
["2023-01-12", "MO.N", -0.013428],
["2023-01-12", "NEE.N", 0.255223],
["2023-01-12", "NKE.N", 0.072752],
],
columns=["Date", "ticker", "diff"],
)
# Melt the stock prices dataframe
df_stock_prices = df_stock_prices.melt(id_vars=['Date'], var_name='ticker', value_name='price')
# Output:
#
# Date ticker price
# 0 2021-11-29 AAPL.OQ 160.24
# 1 2021-11-30 AAPL.OQ 165.30
# 2 2021-12-01 AAPL.OQ 164.77
# 3 2021-12-02 AAPL.OQ 163.76
# 4 2021-12-03 AAPL.OQ 161.84
# 5 2021-11-29 ABBV.N 116.89
# 6 2021-11-30 ABBV.N 115.28
# 7 2021-12-01 ABBV.N 115.91
# 8 2021-12-02 ABBV.N 116.87
# 9 2021-12-03 ABBV.N 118.85
# 10 2021-11-29 ABT.N 128.03
# ... ... ... ...
# Merge the two dataframes
df_new = pd.merge(df_stock_prices, df_sentiment_mean, how='left', on=['Date', 'ticker'])
# Output:
#
# Date ticker price diff
# 0 2021-11-29 AAPL.OQ 160.24 0.163266
# 1 2021-11-30 AAPL.OQ 165.30 NaN
# 2 2021-12-01 AAPL.OQ 164.77 NaN
# 3 2021-12-02 AAPL.OQ 163.76 NaN
# 4 2021-12-03 AAPL.OQ 161.84 NaN
# 5 2021-11-29 ABBV.N 116.89 -0.165520
# 6 2021-11-30 ABBV.N 115.28 NaN
# 7 2021-12-01 ABBV.N 115.91 NaN
# 8 2021-12-02 ABBV.N 116.87 NaN
# 9 2021-12-03 ABBV.N 118.85 NaN
# 10 2021-11-29 ABT.N 128.03 0.149920
# ... ... ... ... ...
# Create a new column with the value of the stock price multiplied by the sentiment mean
df_new['value'] = np.where(df_new['diff'].isna(), df_new['price'], df_new['price'] * df_new['diff'])
# Output:
#
# Date ticker price diff value
# 0 2021-11-29 AAPL.OQ 160.24 0.163266 26.161744
# 1 2021-11-30 AAPL.OQ 165.30 NaN 165.300000
# 2 2021-12-01 AAPL.OQ 164.77 NaN 164.770000
# 3 2021-12-02 AAPL.OQ 163.76 NaN 163.760000
# 4 2021-12-03 AAPL.OQ 161.84 NaN 161.840000
# 5 2021-11-29 ABBV.N 116.89 -0.165520 -19.347633
# ... ... ... ... ... ...
# Pivot the dataframe
df_new = df_new.pivot(index='Date', columns='ticker', values='value')
# Output:
#
# ticker AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N AMD.OQ AMGN.OQ
# Date
# 2021-11-29 26.161744 -19.347633 19.194258 365.82 -10.064166 -24.650295 161.91 203.47
# 2021-11-30 165.300000 115.280000 125.770000 357.40 669.850000 52.600000 158.37 198.88
# 2021-12-01 164.770000 115.910000 126.740000 360.14 657.410000 51.720000 149.11 200.80
# 2021-12-02 163.760000 116.870000 128.380000 365.30 671.880000 53.960000 150.68 201.17
# 2021-12-03 161.840000 118.850000 130.270000 361.42 616.530000 53.320000 144.01 202.44