Home > Blockchain >  How to find the overlapping count of rows between two keys of a multindex dataframe?
How to find the overlapping count of rows between two keys of a multindex dataframe?

Time:01-27

Two dataframes have been concatenated with different keys (multiindex dataframe) with same index. Dates are the index. There are different products in each dataframe as column names and their prices. I basically had to find the correlation between these two dataframes and overlapping period count. Correlation is done but how to find the count of overlapping rows with each product from each dataframe and produce result as a dataframe with products from dataframe 1 as column name and products from dataframe2 as row names and values as the number of overlapping rows for the same period. It should be a matrix.


For example: Dataframe1:
df1 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'col2' : [10, 11, 12], 'col3' :[13, 14, 10]}) 
df2 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'A' : [10, 9, 12], 'B' :[4, 14, 2]}) 

df1=df1.set_index('col1')
df2=df2.set_index('col1')

concat_data1 = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])
concat_data1

              df1                df2
            col2    col3    A   B
col1                
1/12/2020   10  13  10  4
2/12/2020   11  14  9   14
3/12/2020   12  10  12  2

Need output result as: Overlapping period=
       col2 col3

A   2   0   
B   0   1

CodePudding user response:

This is a way of doing it:

import itertools
import pandas as pd

data1 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'col2': [10, 11, 12, 14], 
    'col3': [13, 14, 10, 6],
    'col4': [10, 9, 15, 10], 
    'col5': [10, 9, 15, 5], 
}

data2 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'A': [10, 9, 12, 14],
    'B' :[4, 14, 2, 9],
    'C': [6, 9, 1, 3], 
    'D': [6, 9, 1, 8]
}

df1 = pd.DataFrame(data1).set_index('col1')
df2 = pd.DataFrame(data2).set_index('col1')

concat_data = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])

columns = {df: list(concat_data[df].columns) for df in set(concat_data.columns.get_level_values(0))}

matrix = pd.DataFrame(data=0, columns=columns['df1'], index=columns['df2'])

for row in concat_data.iterrows():
    for cols in list(itertools.product(columns['df1'], columns['df2'])):
        matrix.loc[cols[1], cols[0]]  = row[1]['df1'][cols[0]] == row[1]['df2'][cols[1]]

print(matrix)
  • Related