I have two DataFrames
df1
and df2
where df2
has only one column and I try to create df3
based on the other two data sets. If both DataFrame
columns have a value >0, I try to get a one, otherwise a zero.
df1:
01K 02K 03K 04K
Date
2021-01-01 NaN 3.5 4.2 NaN
2021-01-02 -2.3 -0.1 5.2 2.6
2021-01-03 0.3 NaN -2.5 8.2
2021-01-04 -0.4 NaN 3.0 -4.2
df2:
XX
Date
2021-01-01 NaN
2021-01-02 2.5
2021-01-03 -0.2
2021-01-04 0.3
df3:
01K 02K 03K 04K
Date
2021-01-01 0 0 0 0
2021-01-02 0 0 1 1
2021-01-03 0 0 0 0
2021-01-04 0 0 1 0
For reproducibility:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K':['NaN', -2.3, 0.3, -0.4],
'02K':[3.5, -0.1, 'NaN', 'NaN'],
'03K':[4.2, 5.2, -2.5, 3.0],
'04K':['NaN', 2.6, 8.2, -4.2]})
df1 = df1.set_index('Date')
df1 = df1.replace('NaN',np.nan)
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'XX':['NaN', 2.5, -0.2, 0.3]})
df2 = df2.set_index('Date')
df2 = df2.replace('NaN',np.nan)
I don't know how to assign the condition so that the comparison is possible between two DataFrames
with different number of columns.
I tried it with (but this assumes same dimensions):
df3 = ((df1 > 0) & (df2 > 0)).astype(int)
Thanks a lot!
CodePudding user response:
Use DataFrame.mul
for multiple first DataFrame
with Series
:
df = (df1 > 0).astype(int).mul((df2.iloc[:, 0] > 0).astype(int), axis=0)
print (df)
01K 02K 03K 04K
Date
2021-01-01 0 0 0 0
2021-01-02 0 0 1 1
2021-01-03 0 0 0 0
2021-01-04 0 0 1 0
Or boroadcasting:
df = ((df1 > 0) & (df2.iloc[:, [0]].to_numpy() > 0)).astype(int)
print (df)
01K 02K 03K 04K
Date
2021-01-01 0 0 0 0
2021-01-02 0 0 1 1
2021-01-03 0 0 0 0
2021-01-04 0 0 1 0