Home > database >  Replacing negative values in specific columns of a dataframe
Replacing negative values in specific columns of a dataframe

Time:05-05

This is driving me crazy! I want to replace all negative values in columns containing string "_p" with the value multiplied by -0.5. Here is the code, where Tdf is a dataframe.

    L=list(Tdf.filter(regex='_p').columns)
    Tdf[L]=Tdf[L].astype(float)
    Tdf[Tdf[L]<0]= Tdf[Tdf[L]<0]*-.5  

I get the following error:

"TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value"

I variefied that all columns in Tdf[L] are type float64.

Even more confusing is that when I run a code, essentially the same except looping through multiple dataframes, it works:

csv_names=['Full','Missing','MinusMissing']

for DF in csv_names:
    L=list(vars()[DF].iloc[:,1:])
    vars()[DF][L]=vars()[DF][L].astype(float)
    vars()[DF][vars()[DF][L]<0]= vars()[DF][vars()[DF][L]<0]*-.5  

What am I missing?

CodePudding user response:

Thought this might help. But honestly I cannot seem to find any errors.

Sample data:

Tdf = pd.DataFrame(
          columns=["a_p", "b_p", "c"], data=[[-1,-2,-3],
                                             [1,-2,3],
                                             [1,np.NaN,3]]
      )

L=list(Tdf.filter(regex='_p').columns)
Tdf[L]=Tdf[L].astype(float)
Tdf[Tdf[L]<0]= Tdf[Tdf[L]<0]*-.5 

Output:

Tdf
   a_p  b_p  c
0  0.5  1.0 -3
1  1.0  1.0  3
2  1.0  NaN  3

What data is going into yours that you are getting that error? also on which line is it occurring?

CodePudding user response:

Please clarify your question. If your question is about the error,

Tdf[Tdf[L]<0]= Tdf[Tdf[L]<0]*-.5 

likely fails to non np.nan null values, as the error describes.

If your question is instead:"How do I multiply negative values by -0.5 in columns with "_p" in the column name?"

for col in Tdf.filter(regex='_p').columns.tolist():
    Tdf[col] = Tdf.apply((lambda Tdf: Tdf[col]*-.5 if Tdf[col] < 0 else Tdf[col], axis =1)

  • Related