I have letter ratings that I need to rank against each other and then pull the minimum ranking and median rating in a specific format. I've converted the minimum ranking and median rating to the correct specific format EXCEPT for the very last row in those columns. Could someone help me troubleshoot why my last row does not seem to be converting to the correct format?
For context, I have created a dataframe using 3 columns at the end of my csv:
df = pdf.read_csv('csv path', usecols=['rating1','rating2','rating3'])
The data frame looks like this:
rating1 rating2 rating3
0 D Dd c
1 C Bb A
2 B Bb b
And i created a mapping dictionary that looks like this:
ranking = {
'D': 1, 'Dd': 1, 'd': 1, 'C':2, 'Cc': 2, 'c': 2, 'B': 3, 'Bb': 3, 'b':3, 'A' : 4
}
The ranking rules are: 1. if all the ratings are the same, you can pull the minimum rating; 2. if two of the ratings are the same, pull in the lowest rating; 3.if the three ratings differ, filter out the max rating and the min rating. Since I need to rank them and create two columns that pull the minimum ranking and medium ranking, I need my data frame to look like this when I rank them:
rating1 rating2 rating3 medium_rating minimum_rating
0 D Dd c 1 1
1 C Bb A 3 2
2 B Bb b 3 3
My final step is that once i've ranked the medium_rating and the minimum_rating, I need to convert the number rankings back to the rating1 format so my data frame should FINALLY look like this:
rating1 rating2 rating3 medium_rating minimum_rating
0 D Dd c D D
1 C Bb A B C
2 B Bb b B B
I created a mapping dictionary to convert the rankings to the rating1 format that looks like this:
rating1_rank = {
1: 'D', 2: 'C', 3: 'B', 4: 'A'
}
When I try to convert my medium and minimum rating to the rating1_rank format, all rows in these columns are being converted except for the last one. See below for my code:
This is setting up the dataframe columns and the rating dictionaries:
df = pdf.read_csv('csv path', usecols=['rating1','rating2','rating3'])
ranking = {
'D': 1, 'Dd': 1, 'd': 1, 'C':2, 'Cc': 2, 'c': 2, 'B': 3, 'Bb': 3, 'b':3, 'A' : 4
}
rating1_rank = {
1: 'D', 2: 'C', 3: 'B', 4: 'A'
}
Below is the logic for creating the medium rating column and minimum rating:
df['Medium_Rating'] = np.where(df.replace(ranking).nunique(axis=1).isin([1,2]),
df.replace(ranking).min(axis=1),
df.replace(ranking).median(axis=1))
df['Minimum_Rating'] = df.replace(ranking).min(axis=1)
This is meant to convert the medium rating and minimum rating elements to rating1_rank:
df[:-1] = df.replace(rating1_rank)
df[:-2] = df.replace(rating1_rank)
My result is:
rating1 rating2 rating3 medium_rating minimum_rating
0 D Dd c D D
1 C Bb A B C
**2 B Bb b 3 3**
Why is the last row not converting??
CodePudding user response:
In the present case you can simply use:
df = df.replace(rating1_rank)
print(df)
rating1 rating2 rating3 Medium_Rating Minimum_Rating
0 D Dd c D D
1 C Bb A B C
2 B Bb b B B
To avoid the risk of inadvertently changing columns we don't want to change (though not really applicable here), select only the last two columns with df.iloc
:
df.iloc[:, -2:] = df.iloc[:, -2:].replace(rating1_rank)
Or with df.loc
:
cols = ['Medium_Rating', 'Minimum_Rating']
df.loc[:, cols] = df.loc[:, cols].replace(rating1_rank)
As to why your approach isn't working: I think you are misunderstanding the meaning of df[:-1]
and df[:-2]
. You are selecting rows here. Let's see a print:
print(df[:-1])
rating1 rating2 rating3 Medium_Rating Minimum_Rating
0 D Dd c D D
1 C Bb A B C
So, here you are selecting row 0
through to and including -1
(being 1
in this case, since we have rows 0, 1, 2
). With df[:-2]
, along the same lines, you are only selecting the first row: row 0
through to and including 0
!
Have a further look at the documentation
for all the different ways in which you can select data from dfs
and Series
.