Here is the code:
import pandas as pd
import numpy as np
df1=pd.DataFrame({'0':[1,0,11,0],'1':[0,11,4,0]})
print(df1.head(5))
df2 = df1.copy()
columns=list(df2.columns)
print(columns)
for i in columns:
idx1 = np.where((df2[i]>0) & (df2[i] < 10))
df2.loc[idx1] = 1
idx3 = np.where(df2[i] == 0)
df2.loc[idx3] = 0
idx2 = np.where(df2[i] > 10)
df2.loc[idx2] = 0
print(df2.head(5))
output:
0 1
0 1 0
1 0 11
2 11 4
3 0 0
['0', '1']
0 1
0 1 1
1 0 0
2 0 0
3 0 0
the concerning part is: (idx1 = np.where((df2[i]>0) & (df2[i] < 10)) df2.loc[idx1] = 1, why this logic isn't working?) According to this logic, this is what needs to be my output:
expected:
0 1
0 1 1
1 0 0
2 0 1
3 0 0
CodePudding user response:
This can be done much simpler. You can operate directly on the dataframe as whole; no need to cycle through the columns individually.
Also, you don't need numpy.where
to grab indices; you can use the dataframe with boolean values form the selection directly.
sel = (df2 > 0) & (df2 < 10)
df2[sel] = 1
df2[df2 == 0] = 0
df2[df2 > 10] = 0
(The first line is only to make the second line not overly complicated to the eye.)
Given your conditions however, the result is
0 1
0 1 0
1 0 0
2 0 1
3 0 0
Because you only set numbers between 0 and 10 (exclusive) to 1. A number like 11 is set to 0; while your expected output somehow shows 1 for entries with 11. And 0 is also set to 0, not to 1 (the letter shows in your expected output).
CodePudding user response:
Your expected output does not align with your logic it seems. It looks like anything between 0 and 10 (exclusive) should be 1 and the other be 0.
If so, try this:
df2 = pd.DataFrame(np.where((0 < df1) & (df1 < 10), 1, 0))