This is my dataset:
Name | Test1 | Test3 | Test2 | Quiz |
---|---|---|---|---|
Boo | 0.9 | 0 | 0 | 1.0 |
Buzz | 0.8 | 0.7 | 0 | 0 |
Bree | 0 | 0 | 1.0 | 0 |
How I want my result dataset:
Name | Test1 | Test3 | Test2 | Quiz |
---|---|---|---|---|
Boo | 1 | 0 | 0 | 1 |
Buzz | 1 | 1 | 0 | 0 |
Bree | 0 | 0 | 1 | 0 |
I tried the df.astype to int64 - but this changed all values below 1 to 0. I also tried:
df1 = df.apply(pd.to_numeric, errors='coerce')
but this caused my first column to become NaN values. I also tried:
df.where(df <= 0.4, 1, inplace=True)
but I got an error saying this isn't possible between str and float. I had set_index() in the Name column, so ideally this error shouldn't come. I can't seem to figure this out, need major help :((
CodePudding user response:
df.set_index('Name').astype('float').gt(0.4).astype('int').reset_index()
output
Name Test1 Test3 Test2 Quiz
0 Boo 1 0 0 1
1 Buzz 1 1 0 0
2 Bree 0 0 1 0
CodePudding user response:
It depends of treshold - if need 1
if values greater like 0.4
compare for boolean mask and convert to integers for True, False
to 1,0
mapping:
#if necessary
#df = df.set_index('Name')
df1 = df.apply(pd.to_numeric, errors='coerce').gt(0.4).astype(int)
print (df1)
Test1 Test3 Test2 Quiz
Name
Boo 1 0 0 1
Buzz 1 1 0 0
Bree 0 0 1 0