Home > database >  Create binary pandas DataFrame based on two DataFrames with different number of columns
Create binary pandas DataFrame based on two DataFrames with different number of columns

Time:12-08

I have two DataFrames df1 and df2 where df2 has only one column and I try to create df3 based on the other two data sets. If both DataFrame columns have a value >0, I try to get a one, otherwise a zero.

df1:
            01K  02K  03K   04K
Date                
2021-01-01  NaN  3.5  4.2   NaN
2021-01-02  -2.3 -0.1 5.2   2.6
2021-01-03  0.3  NaN  -2.5  8.2
2021-01-04  -0.4 NaN  3.0   -4.2

df2:
            XX
Date    
2021-01-01  NaN
2021-01-02  2.5
2021-01-03  -0.2
2021-01-04  0.3

df3:
            01K  02K  03K   04K
Date                
2021-01-01  0    0    0     0
2021-01-02  0    0    1     1
2021-01-03  0    0    0     0
2021-01-04  0    0    1     0

For reproducibility:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K':['NaN', -2.3, 0.3, -0.4], 
    '02K':[3.5, -0.1, 'NaN', 'NaN'], 
    '03K':[4.2, 5.2, -2.5, 3.0], 
    '04K':['NaN', 2.6, 8.2, -4.2]}) 
df1 = df1.set_index('Date')
df1 = df1.replace('NaN',np.nan)

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'XX':['NaN', 2.5, -0.2, 0.3]}) 
df2 = df2.set_index('Date')
df2 = df2.replace('NaN',np.nan)

I don't know how to assign the condition so that the comparison is possible between two DataFrames with different number of columns.

I tried it with (but this assumes same dimensions):

df3 = ((df1 > 0) & (df2 > 0)).astype(int)

Thanks a lot!

CodePudding user response:

Use DataFrame.mul for multiple first DataFrame with Series:

df = (df1 > 0).astype(int).mul((df2.iloc[:, 0] > 0).astype(int), axis=0)
print (df)
            01K  02K  03K  04K
Date                          
2021-01-01    0    0    0    0
2021-01-02    0    0    1    1
2021-01-03    0    0    0    0
2021-01-04    0    0    1    0

Or boroadcasting:

df = ((df1 > 0) & (df2.iloc[:, [0]].to_numpy() > 0)).astype(int)
print (df)
            01K  02K  03K  04K
Date                          
2021-01-01    0    0    0    0
2021-01-02    0    0    1    1
2021-01-03    0    0    0    0
2021-01-04    0    0    1    0
  • Related