Home > Software design >  How to fill missing values in categorical data?
How to fill missing values in categorical data?

Time:06-08

I have a dataset of 20000 employees which has following three columns with missing values:

  1. Passing year of College
  2. College specialization
  3. Name of College

Now I have 10000 employees who never went to college. My final aim is to predict their salary.

How can I fill in missing values in this case.

CodePudding user response:

Missing values can be dealt with number of ways, which way to follow depends on the kind of data you have.

  • Deleting the rows with missing values

    Rows with more number of column values as null could be dropped. (Again what is exactly more number depends on individual use case)

  • Imputing the missing vlaues with Mean / Median

    For the numerical Columns you can try replacing the missing values by taking Mean / Median of the column values.

  • Most frequent Values: Applicable to your Scenario

    This method is suitable for Categorical data which i assume is your case. You can try replacing missing vlaues in all three Columns with the most frequently occuring value in the given column.

  • Related