Home > Blockchain >  Merging data frames cross-sectionally
Merging data frames cross-sectionally

Time:12-11

I experimented a lot with merge and join and after re-reading the documentation and checking questions here I want to ask how do I merge the following two dataframes from this minimal working example code:

import pandas as pd
import numpy as np

dates = pd.date_range(start='2020-01-01', periods = 5)
stock_id_msft = ['MSFT']*5
stock_id_aapl = ['AAPL']*5

index = pd.MultiIndex.from_tuples(list(zip(*[dates, stock_id_msft])), 
                                  names = ['date', 'stock_id'])

df_msft = pd.DataFrame(index = index,
                       data  = np.random.rand(5)*10,
                       columns = ['price'])

index = pd.MultiIndex.from_tuples(list(zip(*[dates, stock_id_aapl])), 
                                  names = ['date', 'stock_id'])
df_aapl = pd.DataFrame(index = index, 
                       data = np.random.rand(5),
                       columns= ['price'])

so that the resulting dataframe is multi indexed and has one entry for each date with two rows for each stock_id i.e. the resulting dataframe would be indexed like

[('2020-01-01', 'AAPL'),
 ('2020-01-01', 'MSFT'),
 ('2020-01-02', 'AAPL'),
 ('2020-01-02', 'MSFT'),
 ('2020-01-03', 'AAPL'),
 ('2020-01-03', 'MSFT'),
 ('2020-01-04', 'AAPL'),
 ('2020-01-04', 'MSFT'),
 ('2020-01-05', 'AAPL'),
 ('2020-01-05', 'MSFT')]

and I would have a cross-section of stocks by date. I know I can concat and groupby but I want the final dataframe to have a price for each stock in the cross-section.

CodePudding user response:

use pandas concat method:

>>> pd.concat([df_aapl,df_msft]).sort_index()
            price
date    stock_id    
2020-01-01  AAPL    0.663115
MSFT    4.178832
2020-01-02  AAPL    0.267694
MSFT    0.538919
2020-01-03  AAPL    0.822265
MSFT    7.972534
2020-01-04  AAPL    0.164536
MSFT    2.813109
2020-01-05  AAPL    0.285003
MSFT    4.158308
>>> pd.concat([df_aapl,df_msft]).sort_index().index
MultiIndex([('2020-01-01', 'AAPL'),
        ('2020-01-01', 'MSFT'),
        ('2020-01-02', 'AAPL'),
        ('2020-01-02', 'MSFT'),
        ('2020-01-03', 'AAPL'),
        ('2020-01-03', 'MSFT'),
        ('2020-01-04', 'AAPL'),
        ('2020-01-04', 'MSFT'),
        ('2020-01-05', 'AAPL'),
        ('2020-01-05', 'MSFT')],
       names=['date', 'stock_id'])
  • Related