I am trying to filter my dataframe such that when I create a new columnoutput, it displays the "medium" rating. My dataframe has str values, so I convert them to numbers based on a ranking system I have and then I filter out the maximum and minimum rating per row.
I am running into this error:
TypeError: unsupported operand type(s) for &: 'str' and 'bool'
I've created a data frame that pulls str values from my csv file:
df = pdf.read_csv('csv path', usecols=['rating1','rating2','rating3'])
And my dataframe looks like this:
rating1 rating2 rating3
0 D D C
1 C B A
2 B B B
I need it to look like this
rating1 rating2 rating3 mediumrating
0 D D C 1
1 C B A 3
2 B B B 3
I have a mapping dictionary that converts the values to numbers.
ranking = {
'D': 1, 'C':2, 'B': 3, 'A' : 4
}
Below you can find the code I use to determine the "medium rating". Basically, if all the ratings are the same, you can pull the minimum rating. If two of the ratings are the same, pull in the lowest rating. If the three ratings differ, filter out the max rating and the min rating.
if df == df.loc[(['rating1'] == df['rating2'] & df['rating1'] == df['rating3'])]:
df['mediumrating'] = df.replace(ranking).min(axis=1)
elif df == df.loc[(['rating1'] == df['rating2'] | df['rating1'] == df['rating3'] | df['rating2'] == df['rating3'])]:
df['mediumrating'] = df.replace(ranking).min(axis=1)
else:
df['mediumrating'] == df.loc[(df.replace(ranking) > df.replace(ranking).min(axis=1) & df.replace(ranking)
Any help on my formatting or process would be welcomed!!
CodePudding user response:
Use np.where
:
- For the
condition
, usedf.nunique
applied toaxis=1
and check if the result equals either1
(all values are the same) or2
(two different values) withSeries.isin
. - If
True
, we needdf.min
alongaxis=1
. - If
False
(all unique values), we needdf.median
alongaxis=1
. - Finally, use
astype
to turn resultingfloats
intointegers
.
import pandas as pd
import numpy as np
data = {'rating1': {0: 'D', 1: 'C', 2: 'B'},
'rating2': {0: 'D', 1: 'B', 2: 'B'},
'rating3': {0: 'C', 1: 'A', 2: 'B'}}
df = pd.DataFrame(data)
ranking = {'D': 1, 'C':2, 'B': 3, 'A' : 4}
df['mediumrating'] = np.where(df.replace(ranking).nunique(axis=1).isin([1,2]),
df.replace(ranking).min(axis=1),
df.replace(ranking).median(axis=1)).astype(int)
print(df)
rating1 rating2 rating3 mediumrating
0 D D C 1
1 C B A 3
2 B B B 3
CodePudding user response:
Took to sec to understand what you really meant by filter. Here is some code that should be self explanatory and should achieve what you're looking for:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['D', 'D', 'C'], ['C', 'B', 'A'], ['B', 'B', 'B']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['rating1', 'rating2', 'rating3'])
# dictionary that maps the rating to a number
rating_map = {'D': 1, 'C': 2, 'B': 3, 'A': 4}
def rating_to_number(rating1, rating2, rating3):
if rating1 == rating2 and rating2 == rating3:
return rating_map[rating1]
elif rating1 == rating2 or rating1 == rating3 or rating2 == rating3:
return min(rating_map[rating1], rating_map[rating2], rating_map[rating3])
else:
return rating_map[sorted([rating1, rating2, rating3])[1]]
# create a new column based on the values of the other columns such that the new column has the value of therating_to_number function applied to the other columns
df['mediumrating'] = df.apply(lambda x: rating_to_number(x['rating1'], x['rating2'], x['rating3']), axis=1)
print(df)
This prints out:
rating1 rating2 rating3 mediumrating
0 D D C 2
1 C B A 3
2 B B B 3
Edit: updated rating_to_number based on your updated question