I have a df
and need to replace negative values with specified values, how can I make my code simpler and without warning.
Before replacement:
datetime a0 a1 a2
0 2022-01-01 0.097627 0.430379 0.205527
1 2022-01-02 0.089766 -0.152690 0.291788
2 2022-01-03 -0.124826 0.783546 0.927326
3 2022-01-04 -0.233117 0.583450 0.057790
4 2022-01-05 0.136089 0.851193 -0.857928
5 2022-01-06 -0.825741 -0.959563 0.665240
6 2022-01-07 0.556314 0.740024 0.957237
7 2022-01-08 0.598317 -0.077041 0.561058
8 2022-01-09 -0.763451 0.279842 -0.713293
9 2022-01-10 0.889338 0.043697 -0.170676
After replacing,
datetime a0 a1 a2
0 2022-01-01 9.762701e-02 4.303787e-01 2.055268e-01
1 2022-01-02 8.976637e-02 1.000000e-13 2.917882e-01
2 2022-01-03 1.000000e-13 7.835460e-01 9.273255e-01
3 2022-01-04 1.000000e-13 5.834501e-01 5.778984e-02
4 2022-01-05 1.360891e-01 8.511933e-01 1.000000e-13
5 2022-01-06 1.000000e-13 1.000000e-13 6.652397e-01
6 2022-01-07 5.563135e-01 7.400243e-01 9.572367e-01
7 2022-01-08 5.983171e-01 1.000000e-13 5.610584e-01
8 2022-01-09 1.000000e-13 2.798420e-01 1.000000e-13
9 2022-01-10 8.893378e-01 4.369664e-02 1.000000e-13
<ipython-input-5-887189ce29a9>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df2[df2 < 0] = float(1e-13)
<ipython-input-5-887189ce29a9>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df2[df2 < 0] = float(1e-13)
My code is as follows, where the generate_data
function is to generate demo data.
import numpy as np
import pandas as pd
np.random.seed(0)
# This function generates demo data.
def generate_data():
datetime1 = pd.date_range(start='20220101', end='20220110')
df = pd.DataFrame(data=datetime1, columns=['datetime'])
col = [f'a{x}' for x in range(3)]
df[col] = np.random.uniform(-1, 1, (10, 3))
return df
def main():
df = generate_data()
print(df)
col = list(df.columns)[1:]
df2 = df[col]
df2[df2 < 0] = float(1e-13)
df[col] = df2
print(df)
return
if __name__ == '__main__':
main()
CodePudding user response:
You get a warning, because not all columns contain numerical values, you can use df2.mask(...)
to avoid the warnings.
df2 = df2.mask(df2 < 0, float(1e-13))
CodePudding user response:
you may try to use np.where
pd.concat([df.datetime, df.iloc[:,1:4].apply(lambda x:np.where(x<0,float(1e-13),x),axis=0)],axis=1)
Btw thanks for the beautiful reproducible example
CodePudding user response:
Function loc from pandas library may help. Once your df is generated:
# get columns to check for the condition
cols = list(df.columns)[1:]
# iterate through columns and replace
for col in cols:
df.loc[df[col] < 0, col] = float(1e-13)
This should do the trick, hope it helps!
CodePudding user response:
Maybe this:
df1['datetime'] = df['datetime']
df = df.mask(df.loc[:, df.columns != 'datetime'] < 0, float(1e-13))
df['datetime'] = df1['datetime']
print(df)
All the code:
import numpy as np
import pandas as pd
np.random.seed(0)
# This function generates demo data.
def generate_data():
datetime1 = pd.date_range(start='20220101', end='20220110')
df = pd.DataFrame(data=datetime1, columns=['datetime'])
col = [f'a{x}' for x in range(3)]
df[col] = np.random.uniform(-1, 1, (10, 3))
return df
def main():
df = generate_data()
df1['datetime'] = df['datetime']
df = df.mask(df.loc[:, df.columns != 'datetime'] < 0, float(1e-13))
df['datetime'] = df1['datetime']
print(df)
return
if __name__ == '__main__':
main()