What to do if there are missing values in data-CodePudding

As shown in the picture, row 19 contains a missing data. However, I am suppose to plot points by iterating over a sliding window of 5 using 2 previously defined functions.

I thought of simply omitting the row with the missing data, but this would leave me with 29 rows and as such I would face the issue of being short of one value.

As such, I thought of using creating a lm and using predict(). However, the usual predict predicts x values from y. I would like to predict x when y = 0.7. How would I go about doing this? I have the lm as:

akima.fit <- lm(data$akima_data[,2]~data$akima_data[,1])

CodePudding user response：

You can just replace NA with 0.48275862

CodePudding user response：

I would recommend two options.

Remove the row with a missing value and then do the regression

Do data imputation(wiki, post) and then do the regression

You should keep in mind of the implications of both options. The first one simply treats the observation with a missing value as an incomplete data point and disregards it. However this might lead to a bias in your sample since the missing values might be created based on certain measurement(e.g. gender or age).

The implication of the second is that every data point is important and one needs to calculate the missing values based on a certain method. Of course it is important for the analyst/researcher to know which method fits best(either the statistical side or the assumption side of the method) for his/her analysis.