TypeError: unsupported operand type(s) for &: 'str' and 'bool' for DF filtering-CodePudding

I am trying to filter my dataframe such that when I create a new columnoutput, it displays the "medium" rating. My dataframe has str values, so I convert them to numbers based on a ranking system I have and then I filter out the maximum and minimum rating per row.

I am running into this error:

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

I've created a data frame that pulls str values from my csv file:

df = pdf.read_csv('csv path', usecols=['rating1','rating2','rating3'])

And my dataframe looks like this:

             rating1    rating2     rating3
0             D           D            C
1             C           B            A
2             B           B            B

I need it to look like this

             rating1    rating2     rating3    mediumrating
0             D           D            C           1
1             C           B            A           3
2             B           B            B           3

I have a mapping dictionary that converts the values to numbers.

ranking = {
         'D': 1, 'C':2, 'B': 3, 'A' : 4
         }

Below you can find the code I use to determine the "medium rating". Basically, if all the ratings are the same, you can pull the minimum rating. If two of the ratings are the same, pull in the lowest rating. If the three ratings differ, filter out the max rating and the min rating.

if df == df.loc[(['rating1'] == df['rating2'] & df['rating1'] == df['rating3'])]:
    df['mediumrating'] = df.replace(ranking).min(axis=1)
elif df == df.loc[(['rating1'] == df['rating2'] | df['rating1'] == df['rating3'] | df['rating2'] == df['rating3'])]:
    df['mediumrating'] = df.replace(ranking).min(axis=1)
else:
    df['mediumrating'] == df.loc[(df.replace(ranking) > df.replace(ranking).min(axis=1) & df.replace(ranking)

Any help on my formatting or process would be welcomed!!

CodePudding user response：

Use np.where:

For the condition, use df.nunique applied to axis=1 and check if the result equals either 1 (all values are the same) or 2 (two different values) with Series.isin.
If True, we need df.min along axis=1.
If False (all unique values), we need df.median along axis=1.
Finally, use astype to turn resulting floats into integers.

import pandas as pd
import numpy as np

data = {'rating1': {0: 'D', 1: 'C', 2: 'B'}, 
        'rating2': {0: 'D', 1: 'B', 2: 'B'}, 
        'rating3': {0: 'C', 1: 'A', 2: 'B'}}

df = pd.DataFrame(data)

ranking = {'D': 1, 'C':2, 'B': 3, 'A' : 4}

df['mediumrating']  = np.where(df.replace(ranking).nunique(axis=1).isin([1,2]), 
                               df.replace(ranking).min(axis=1), 
                               df.replace(ranking).median(axis=1)).astype(int)

print(df)

  rating1 rating2 rating3  mediumrating
0       D       D       C             1
1       C       B       A             3
2       B       B       B             3

CodePudding user response：

Took to sec to understand what you really meant by filter. Here is some code that should be self explanatory and should achieve what you're looking for:

# Import pandas library
import pandas as pd
  
# initialize list of lists
data = [['D', 'D', 'C'], ['C', 'B', 'A'], ['B', 'B', 'B']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['rating1', 'rating2', 'rating3'])

# dictionary that maps the rating to a number
rating_map = {'D': 1, 'C': 2, 'B': 3, 'A': 4}

def rating_to_number(rating1, rating2, rating3):
    if rating1 == rating2 and rating2 == rating3:
        return rating_map[rating1]
    elif rating1 == rating2 or rating1 == rating3 or rating2 == rating3:
        return min(rating_map[rating1], rating_map[rating2], rating_map[rating3])
    else:
        return rating_map[sorted([rating1, rating2, rating3])[1]]


# create a new column based on the values of the other columns such that the new column has the value of therating_to_number function applied to the other columns
df['mediumrating'] = df.apply(lambda x: rating_to_number(x['rating1'], x['rating2'], x['rating3']), axis=1)

print(df)

This prints out:

  rating1 rating2 rating3  mediumrating
0       D       D       C             2
1       C       B       A             3
2       B       B       B             3

Edit: updated rating_to_number based on your updated question