Home > Enterprise >  how to complete missing data in a dataframe
how to complete missing data in a dataframe

Time:02-18

i am using an API to download live stock market data. this information a lot of the time is incomplete. e.g;

                                 Open        High         Low       Close   Adj Close   Volume
Datetime
2022-02-16 15:00:00-05:00  172.872101  173.029999  172.839996  172.910004  172.910004        0
2022-02-16 15:01:00-05:00  172.899994  172.949997  172.779999  172.815002  172.815002   160249
2022-02-16 15:04:00-05:00  173.089996  173.320007  173.030106  173.315002  173.315002   311095
2022-02-16 15:05:00-05:00  173.320007  173.339996  173.164993  173.214996  173.214996   174639
2022-02-16 15:07:00-05:00  173.139999  173.179993  173.089996  173.160004  173.160004   135559

as you can tell by the timestamp , it skips a lot of information

my question is : is there a way to complete that missing data to achieve something like this ?

                                 Open        High         Low       Close   Adj Close   Volume
Datetime
2022-02-16 15:00:00-05:00  172.872101  173.029999  172.839996  172.910004  172.910004        0
2022-02-16 15:01:00-05:00  172.899994  172.949997  172.779999  172.815002  172.815002   160249
2022-02-16 15:02:00-05:00  172.809998  172.990005  172.809998  172.979996  172.979996   119117
2022-02-16 15:03:00-05:00  172.970001  173.169998  172.964996  173.080093  173.080093   264624
2022-02-16 15:04:00-05:00  173.089996  173.320007  173.030106  173.315002  173.315002   311095
2022-02-16 15:05:00-05:00  173.320007  173.339996  173.164993  173.214996  173.214996   174639
2022-02-16 15:06:00-05:00  173.220001  173.220001  173.080002  173.139999  173.139999   124707
2022-02-16 15:07:00-05:00  173.139999  173.179993  173.089996  173.160004  173.160004   135559

CodePudding user response:

With resample to 1 minute periods then interpolate to fill the NaN values

df = df.resample('1T').interpolate(method='linear', limit_direction='forward', axis=0)

CodePudding user response:

There are lots of ways to do this. Go through the whole blog. https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779

  1. Drop the missing data if you've enough data for training.
  2. Add the data using the techniques in the blog.
  • Related