I have a dataset of 20000 employees which has following three columns with missing values:
- Passing year of College
- College specialization
- Name of College
Now I have 10000 employees who never went to college. My final aim is to predict their salary.
How can I fill in missing values in this case.
CodePudding user response:
Missing values can be dealt with number of ways, which way to follow depends on the kind of data you have.
Deleting the rows with missing values
Rows with more number of column values as null could be dropped. (Again what is exactly more number depends on individual use case)
Imputing the missing vlaues with Mean / Median
For the numerical Columns you can try replacing the missing values by taking Mean / Median of the column values.
Most frequent Values: Applicable to your Scenario
This method is suitable for Categorical data which i assume is your case. You can try replacing missing vlaues in all three Columns with the most frequently occuring value in the given column.