I try to create a DataFrame df3
based on two other DataFrames df1
and df2
. If the value in df1
is bigger or equal the value in df2
, then I want to assign a 1 in df3
and otherwise a 0.
This would be a sample data for illustration:
df1
Date 01K 02K 03K 04K
0 2021-01-01 NaN 3.5 4.2 NaN
1 2021-01-02 2.3 0.1 5.2 2.6
2 2021-01-03 0.3 NaN 2.5 8.2
3 2021-01-04 0.4 NaN 3.0 4.2
df2
Date 01K 02K 03K 04K
0 2021-01-01 NaN NaN NaN NaN
1 2021-01-02 2.5 0.6 5.8 2.3
2 2021-01-03 0.2 NaN 2.5 8.1
3 2021-01-04 0.3 NaN 2.8 4.2
df3
Date 01K 02K 03K 04K
0 2021-01-01 0 0 0 0
1 2021-01-02 0 0 0 1
2 2021-01-03 1 0 1 1
3 2021-01-04 1 0 1 1
For reproducibility:
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K':['NaN', 2.3, 0.3, 0.4],
'02K':[3.5, 0.1, 'NaN', 'NaN'],
'03K':[4.2, 5.2, 2.5, 3.0],
'04K':['NaN', 2.6, 8.2, 4.2]})
df1 = df1.replace('NaN',np.nan)
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K':['NaN', 2.5, 0.2, 0.3],
'02K':['NaN', 0.6, 'NaN', 'NaN'],
'03K':['NaN', 5.8, 2.5, 2.8],
'04K':['NaN', 2.3, 8.1, 4.2]})
df2 = df2.replace('NaN',np.nan)
I tried to do it with np.where(), but I get an array as output.
df3 = np.where(df1>=df2, 1, 0)
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 1, 0, 1, 1]])
Is there an efficient way to create df3
with the same structure as the other DataFrames for a big dataset?
Thanks a lot!
CodePudding user response:
Convert Date
to DatetimeIndex
in both, so possible compare with casting ouput to integers for map True, False
s to 1, 0
:
df1 = df1.set_index('Date')
df2 = df2.set_index('Date')
df3 = (df1>=df2).astype(int)
If need instead 1
using -1
use:
df3 = (df1>=df2).astype(int).mul(-1)
df3 = pd.DataFrame(np.where(df1>=df2, -1, 0),
index=df1.index,
columns=df1.columns)