I experimented a lot with merge
and join
and after re-reading the documentation and checking questions here I want to ask how do I merge the following two dataframes from this minimal working example code:
import pandas as pd
import numpy as np
dates = pd.date_range(start='2020-01-01', periods = 5)
stock_id_msft = ['MSFT']*5
stock_id_aapl = ['AAPL']*5
index = pd.MultiIndex.from_tuples(list(zip(*[dates, stock_id_msft])),
names = ['date', 'stock_id'])
df_msft = pd.DataFrame(index = index,
data = np.random.rand(5)*10,
columns = ['price'])
index = pd.MultiIndex.from_tuples(list(zip(*[dates, stock_id_aapl])),
names = ['date', 'stock_id'])
df_aapl = pd.DataFrame(index = index,
data = np.random.rand(5),
columns= ['price'])
so that the resulting dataframe is multi indexed and has one entry for each date with two rows for each stock_id
i.e. the resulting dataframe would be indexed like
[('2020-01-01', 'AAPL'),
('2020-01-01', 'MSFT'),
('2020-01-02', 'AAPL'),
('2020-01-02', 'MSFT'),
('2020-01-03', 'AAPL'),
('2020-01-03', 'MSFT'),
('2020-01-04', 'AAPL'),
('2020-01-04', 'MSFT'),
('2020-01-05', 'AAPL'),
('2020-01-05', 'MSFT')]
and I would have a cross-section of stocks by date. I know I can concat
and groupby
but I want the final dataframe to have a price for each stock in the cross-section.
CodePudding user response:
use pandas concat method:
>>> pd.concat([df_aapl,df_msft]).sort_index()
price
date stock_id
2020-01-01 AAPL 0.663115
MSFT 4.178832
2020-01-02 AAPL 0.267694
MSFT 0.538919
2020-01-03 AAPL 0.822265
MSFT 7.972534
2020-01-04 AAPL 0.164536
MSFT 2.813109
2020-01-05 AAPL 0.285003
MSFT 4.158308
>>> pd.concat([df_aapl,df_msft]).sort_index().index
MultiIndex([('2020-01-01', 'AAPL'),
('2020-01-01', 'MSFT'),
('2020-01-02', 'AAPL'),
('2020-01-02', 'MSFT'),
('2020-01-03', 'AAPL'),
('2020-01-03', 'MSFT'),
('2020-01-04', 'AAPL'),
('2020-01-04', 'MSFT'),
('2020-01-05', 'AAPL'),
('2020-01-05', 'MSFT')],
names=['date', 'stock_id'])