Home > Net >  How to replace negative values in dataframe with specified values?
How to replace negative values in dataframe with specified values?

Time:09-13

I have a df and need to replace negative values with specified values, how can I make my code simpler and without warning. Before replacement:

    datetime        a0        a1        a2
0 2022-01-01  0.097627  0.430379  0.205527
1 2022-01-02  0.089766 -0.152690  0.291788
2 2022-01-03 -0.124826  0.783546  0.927326
3 2022-01-04 -0.233117  0.583450  0.057790
4 2022-01-05  0.136089  0.851193 -0.857928
5 2022-01-06 -0.825741 -0.959563  0.665240
6 2022-01-07  0.556314  0.740024  0.957237
7 2022-01-08  0.598317 -0.077041  0.561058
8 2022-01-09 -0.763451  0.279842 -0.713293
9 2022-01-10  0.889338  0.043697 -0.170676

After replacing,

    datetime            a0            a1            a2
0 2022-01-01  9.762701e-02  4.303787e-01  2.055268e-01
1 2022-01-02  8.976637e-02  1.000000e-13  2.917882e-01
2 2022-01-03  1.000000e-13  7.835460e-01  9.273255e-01
3 2022-01-04  1.000000e-13  5.834501e-01  5.778984e-02
4 2022-01-05  1.360891e-01  8.511933e-01  1.000000e-13
5 2022-01-06  1.000000e-13  1.000000e-13  6.652397e-01
6 2022-01-07  5.563135e-01  7.400243e-01  9.572367e-01
7 2022-01-08  5.983171e-01  1.000000e-13  5.610584e-01
8 2022-01-09  1.000000e-13  2.798420e-01  1.000000e-13
9 2022-01-10  8.893378e-01  4.369664e-02  1.000000e-13
<ipython-input-5-887189ce29a9>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2[df2 < 0] = float(1e-13)
<ipython-input-5-887189ce29a9>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2[df2 < 0] = float(1e-13)

My code is as follows, where the generate_data function is to generate demo data.

import numpy as np
import pandas as pd

np.random.seed(0)


# This function generates demo data.
def generate_data():
    datetime1 = pd.date_range(start='20220101', end='20220110')
    df = pd.DataFrame(data=datetime1, columns=['datetime'])
    col = [f'a{x}' for x in range(3)]
    df[col] = np.random.uniform(-1, 1, (10, 3))
    return df


def main():
    df = generate_data()
    print(df)
    col = list(df.columns)[1:]
    df2 = df[col]
    df2[df2 < 0] = float(1e-13)
    df[col] = df2
    print(df)
    return


if __name__ == '__main__':
    main()

CodePudding user response:

You get a warning, because not all columns contain numerical values, you can use df2.mask(...) to avoid the warnings.

df2 = df2.mask(df2 < 0, float(1e-13))

CodePudding user response:

you may try to use np.where

pd.concat([df.datetime, df.iloc[:,1:4].apply(lambda x:np.where(x<0,float(1e-13),x),axis=0)],axis=1)

Btw thanks for the beautiful reproducible example

CodePudding user response:

Function loc from pandas library may help. Once your df is generated:

# get columns to check for the condition
cols = list(df.columns)[1:]

# iterate through columns and replace
for col in cols:
    df.loc[df[col] < 0, col] = float(1e-13)

This should do the trick, hope it helps!

CodePudding user response:

Maybe this:

df1['datetime'] = df['datetime']
df =  df.mask(df.loc[:, df.columns != 'datetime'] < 0, float(1e-13))
df['datetime'] = df1['datetime']
print(df)

All the code:

import numpy as np
import pandas as pd

np.random.seed(0)


# This function generates demo data.
def generate_data():
    datetime1 = pd.date_range(start='20220101', end='20220110')
    df = pd.DataFrame(data=datetime1, columns=['datetime'])
    col = [f'a{x}' for x in range(3)]
    df[col] = np.random.uniform(-1, 1, (10, 3))
    return df

 

def main():
    df = generate_data()
    df1['datetime'] = df['datetime']
    df =  df.mask(df.loc[:, df.columns != 'datetime'] < 0, float(1e-13))
    df['datetime'] = df1['datetime']
    print(df)
    return


if __name__ == '__main__':
    main()
  • Related