I have a dataset that looks similar to this:
|Budget|Profit |Ranking| #Ranking has values ranging from 1-10 (1 = worst, 10 = best)
|------|-------|-------|
|10000 | NaN | 8 |
|NaN | 4000 | 3 |
|5000 | 7000 | 9 |
|12000 | NaN | 8 |
|2000 | 4500 | 3 |
|5000 | 10000 | 8 |
I want to fill the missing values in Budget and Profit in regards to the median of their corresponding ranking, but can't seem to get it right.
for score in range(1, 11):
median_score = df[df['score'] == score]['budget'].median()
index_nr = df[df['score'] == score]
for i in index_nr:
df.loc[i, :]['budget'].filla(median_score)
How can I improve my code so that it fills the missing data with the median of it's rank?
CodePudding user response:
First get the columns' median of each ranking group using groupby
transform
, and then pass it to fillna
groups_median = df.groupby('Ranking').transform('median')
df = df.fillna(groups_median)