Home > database >  why condition for my dataframe is not working?
why condition for my dataframe is not working?

Time:02-19

Here is the code:

    import pandas as pd
    import numpy as np
    df1=pd.DataFrame({'0':[1,0,11,0],'1':[0,11,4,0]})
    print(df1.head(5))
    df2 = df1.copy()
    columns=list(df2.columns)
    print(columns)
    
    for i in columns:
        idx1 = np.where((df2[i]>0) & (df2[i] < 10))
        df2.loc[idx1] = 1
        idx3 = np.where(df2[i] == 0)
        df2.loc[idx3] = 0       
        idx2 = np.where(df2[i] > 10)
        df2.loc[idx2] = 0
    
    
    print(df2.head(5))
  output:
    0   1
0   1   0
1   0  11
2  11   4
3   0   0
['0', '1']
   0  1
0  1  1
1  0  0
2  0  0
3  0  0

the concerning part is: (idx1 = np.where((df2[i]>0) & (df2[i] < 10)) df2.loc[idx1] = 1, why this logic isn't working?) According to this logic, this is what needs to be my output:

expected:
       0  1
    0  1  1
    1  0  0
    2  0  1
    3  0  0

CodePudding user response:

This can be done much simpler. You can operate directly on the dataframe as whole; no need to cycle through the columns individually.

Also, you don't need numpy.where to grab indices; you can use the dataframe with boolean values form the selection directly.

sel = (df2 > 0) & (df2 < 10)
df2[sel] = 1
df2[df2 == 0] = 0
df2[df2 > 10] = 0

(The first line is only to make the second line not overly complicated to the eye.)

Given your conditions however, the result is

   0  1
0  1  0
1  0  0
2  0  1
3  0  0

Because you only set numbers between 0 and 10 (exclusive) to 1. A number like 11 is set to 0; while your expected output somehow shows 1 for entries with 11. And 0 is also set to 0, not to 1 (the letter shows in your expected output).

CodePudding user response:

Your expected output does not align with your logic it seems. It looks like anything between 0 and 10 (exclusive) should be 1 and the other be 0.

If so, try this:

df2 = pd.DataFrame(np.where((0 < df1) & (df1 < 10), 1, 0))
  • Related