Home > front end >  Create dummy DataFrame based on conditions
Create dummy DataFrame based on conditions

Time:05-13

I try to create a dummy DataFrame df_dummy based on a DataFrame df with several conditions.

  • if value > 0 --> 1
  • if value < 0 --> 0
  • else (0, NaN) --> 0
df:
            ID1     ID2     ID3
Date            
2022-01-01  -1.0    -0.1    0.0
2022-01-02  0.0     1.2     0.7
2022-01-03  NaN     2.0     1.0
2022-01-04  -0.8    0.0     0.0
2022-01-05  1.1     NaN     -0.5

df_dummy:
            ID1     ID2     ID3
Date            
2022-01-01  0       0       NaN
2022-01-02  NaN     1       1
2022-01-03  NaN     1       1
2022-01-04  NAN     NaN     NaN
2022-01-05  1       NaN     0

I tried to define a signal for the dummy like that:

def signal(x):
    if(x>0): 
        return 1
    elif(x<0):
        return 0
    else:
        return np.nan
df_dummy = df[:].apply(lambda x: signal, axis=1)

data_signal = df[:].apply(lambda x: 1 if x>0 -1 if x<0 else np.nan, axis=1)

Is there an intuitive way to create such conditions for the df_dummy?

Thanks a lot!

CodePudding user response:

You can use np.select:

# np.select returns a numpy array
# so we copy data to reserve index/columns
df_dummy = df.copy()
df_dummy[:] = np.select((df > 0, df < 0), (1, 0), np.nan)

Also:

df_dummy = pd.DataFrame(np.select((df > 0, df < 0), (1, 0), np.nan),
                        index=df.index, columns=df.columns)

Output:

            ID1  ID2  ID3
Date                     
2022-01-01  0.0  0.0  NaN
2022-01-02  NaN  1.0  1.0
2022-01-03  NaN  1.0  1.0
2022-01-04  0.0  NaN  NaN
2022-01-05  1.0  NaN  0.0

CodePudding user response:

Using pandas:

df_dummy = df.gt(0).astype(int).mask(df.isna()|df.eq(0))

with numpy.sign:

df_dummy = np.sign(df).replace(0,np.nan).clip(0)

output:

            ID1  ID2  ID3
Date                     
2022-01-01  0.0  0.0  NaN
2022-01-02  NaN  1.0  1.0
2022-01-03  NaN  1.0  1.0
2022-01-04  0.0  NaN  NaN
2022-01-05  1.0  NaN  0.0

CodePudding user response:

You can use applymap with signal you created

df_dummy = df.applymap(signal)
print(df_dummy)

Output

            ID1  ID2  ID3
Date                     
2022-01-01  0.0  0.0  NaN
2022-01-02  NaN  1.0  1.0
2022-01-03  NaN  1.0  1.0
2022-01-04  0.0  NaN  NaN
2022-01-05  1.0  NaN  0.0
  • Related