Home > other >  Making a dataframe of bool values
Making a dataframe of bool values

Time:07-08

I have a datframe,

df = pd.DataFrame({'a':[12,34,98,26],'b':[12,87,98,12],'c':[11,23,43,1]})


    a   b   c
0   12  12  11
1   34  87  23
2   98  98  43
3   26  12  1

I want to make a max_df which contains bool values. In df, if an entry in row is maximum of its row, there would be 'True' in place of that entry in max_df, otherwise there would be 'False'. my max_df should look like,

      a       b       c
0   True    True    False
1   False   True    False
2   True    True    False
3   True    False   False

I wrote this code for this,

max_df = df.eq(df.max(axis=1), axis=0)

But it gives the value error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Is there any way to do that?

CodePudding user response:

Your solution is probably the best and it does work,

df.eq(df.max(axis=1),axis=0)

But as you said that you are using a ancient version in which it doesn't , then you can also try:

max_series = df.max(axis=1)
max_df  = pd.DataFrame()
for col in df:
    max_df[col] = df[col]==max_series

Which is a worst solution but I guess it will work in even older versions than yours.

If "df[col]==max_series" is not working for you then:

max_series = df.max(axis=1)
max_df = pd.DataFrame()
for col in df:
    max_df[col] = [df[col][i] == max_series[i] for i in range(len(df[col]))]

But at this point is just better to figure out how to get permissions to update your software, but most probably you won't need special permissions installing it in your user. If you are using a ubuntu machine it will be easy to do a new environment installing miniconda for instance or any other distribution/ package manager.

CodePudding user response:

I think your solution is the best. However, as you cannot upgrade pandas, you could also do something like this (one-liner):

df_max = df.apply(lambda x: x==df.max(axis=1))
df_max

---------------------------
    a       b       c
0   True    True    False
1   False   True    False
2   True    True    False
3   True    False   False
---------------------------

Note, if this approach also throws an error, you could try this (as I would then suspect df.max(axis=1) to be the reason for your error):

df_max = df.apply(lambda x: x==df.T.apply(lambda x: max(x)))

EDIT

You can try the following code instead:

df = pd.DataFrame({'a':[12,34,98,26],'b':[12,87,98,12],'c':[11,23,43,1]})

df["temp_max"] = df.T.apply(lambda x: max(x))
for index, row in df.iterrows():
    df.iloc[index, :-1] = [val == row[-1] for val in row[:-1]]
df.drop(columns=["temp_max"])

What you do here is to first assign the maximum of each row as a new column and then iterate over each row and compare it with the maximum of the row.

  • Related