Home > Back-end >  How do I replace missing values based on median of another value?
How do I replace missing values based on median of another value?

Time:11-07

I have a dataset that looks similar to this:

|Budget|Profit |Ranking| #Ranking has values ranging from 1-10 (1 = worst, 10 = best)
|------|-------|-------|
|10000 | NaN   | 8     |
|NaN   | 4000  | 3     |
|5000  | 7000  | 9     |
|12000 | NaN   | 8     |
|2000  | 4500  | 3     |
|5000  | 10000 | 8     |

I want to fill the missing values in Budget and Profit in regards to the median of their corresponding ranking, but can't seem to get it right.

for score in range(1, 11):
  median_score = df[df['score'] == score]['budget'].median()
  index_nr = df[df['score'] == score]
  for i in index_nr:
    df.loc[i, :]['budget'].filla(median_score)

How can I improve my code so that it fills the missing data with the median of it's rank?

CodePudding user response:

First get the columns' median of each ranking group using groupby transform, and then pass it to fillna

groups_median = df.groupby('Ranking').transform('median')
df = df.fillna(groups_median)
  • Related