I have two data.frames (df1 and binary df2) and I want to create a third data.frame (df3) based on df1 conditioned on the binary data.frame df2. I want to set all values to zero if there is a zero at that position in the binary data.frame 2.
import pandas as pd
df1 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':['1.2', '2.3', '0.3', '0.5'],
'02K W':['3.5', '0.1', 'nan', 'nan'],
'03K W':['4.2', '5.2', '2.5', '3.0'],
'04K W':['1.5', '2.6', '8.2', '4.2']})
df2 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':['1', '1', '1', '1'],
'02K W':['1', '0', '0', '0'],
'03K W':['0', '0', '0', '1'],
'04K W':['0', '1', '1', '0']})
To create the new data.frame 3, I was thinking about a for loop over the columns and replacing the values in df1 based on df2 (values=0). So far, I wasn't able to figure out a correct syntax.
df3 = for col in df1.columns:
if df2[col].value[1:] == 0:
df3[col].value[1:] = 0
else:
df3[col].value[1:] = df1[col].value
I would expect a data.frame which looks like that:
df3 = pd.DataFrame({
'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'01K W':['1.2', '2.3', '0.3', '0.5'],
'02K W':['3.5', '0', '0', '0'],
'03K W':['0', '0', '0', '3.0'],
'04K W':['0', '2.6', '8.2', '0']})
Thank you very much for your help :)
CodePudding user response:
You can index df1 by df2. Since df2 is of string dtype, first convert to boolean.
df2_bool = df2.set_index('Date').astype(int).astype(bool)
df1.set_index('Date')[df2_bool].fillna(0).reset_index()
You get
Date 01K W 02K W 03K W 04K W
0 2021-01-01 1.2 3.5 0 0
1 2021-01-02 2.3 0 0 2.6
2 2021-01-03 0.3 0 0 8.2
3 2021-01-04 0.5 0 3.0 0
Ofcourse you can make it a one-line (and avoid saving boolean df2)
df1.set_index('Date')[df2.set_index('Date').astype(int).astype(bool)].fillna(0).reset_index()
CodePudding user response:
You can do something like this:
bm = df2.sort_values('Date').set_index('Date').astype(int).astype(bool).values
df1.sort_values('Date').set_index('Date').where(bm, 0).reset_index()
Basically you create a binary matrix:
df2.sort_values('Date').set_index('Date').astype(int).astype(bool).values
Then you use the pandas where clause.
NOTE: This will only work if all dates are in df1
and df2
.
CodePudding user response:
This code below will work.
df1 = df1.set_index('Date').astype(float)
df2 = df2.set_index('Date').astype(float)
df1.fillna(0,inplace=True)
df3 = df1 * df2
You can use fillna() if df2 also has null values.
The operation is a simple matrix multiplication.
you can also do this.
df1.mul(df2)