Change a dataframe of floats and objects into a binary dataframe whilst retaining string values of c-CodePudding

This is my dataset:

Name	Test1	Test3	Test2	Quiz
Boo	0.9	0	0	1.0
Buzz	0.8	0.7	0	0
Bree	0	0	1.0	0

How I want my result dataset:

Name	Test1	Test3	Test2	Quiz
Boo	1	0	0	1
Buzz	1	1	0	0
Bree	0	0	1	0

I tried the df.astype to int64 - but this changed all values below 1 to 0. I also tried:

df1 = df.apply(pd.to_numeric, errors='coerce')

but this caused my first column to become NaN values. I also tried:

df.where(df <= 0.4, 1, inplace=True)

but I got an error saying this isn't possible between str and float. I had set_index() in the Name column, so ideally this error shouldn't come. I can't seem to figure this out, need major help :((

CodePudding user response：

df.set_index('Name').astype('float').gt(0.4).astype('int').reset_index()

output

    Name    Test1   Test3   Test2   Quiz
0   Boo     1       0       0       1
1   Buzz    1       1       0       0
2   Bree    0       0       1       0

CodePudding user response：

It depends of treshold - if need 1 if values greater like 0.4 compare for boolean mask and convert to integers for True, False to 1,0 mapping:

#if necessary
#df = df.set_index('Name')

df1 = df.apply(pd.to_numeric, errors='coerce').gt(0.4).astype(int)
print (df1)
      Test1  Test3  Test2  Quiz
Name                           
Boo       1      0      0     1
Buzz      1      1      0     0
Bree      0      0      1     0