i am using an API to download live stock market data. this information a lot of the time is incomplete. e.g;
Open High Low Close Adj Close Volume
Datetime
2022-02-16 15:00:00-05:00 172.872101 173.029999 172.839996 172.910004 172.910004 0
2022-02-16 15:01:00-05:00 172.899994 172.949997 172.779999 172.815002 172.815002 160249
2022-02-16 15:04:00-05:00 173.089996 173.320007 173.030106 173.315002 173.315002 311095
2022-02-16 15:05:00-05:00 173.320007 173.339996 173.164993 173.214996 173.214996 174639
2022-02-16 15:07:00-05:00 173.139999 173.179993 173.089996 173.160004 173.160004 135559
as you can tell by the timestamp , it skips a lot of information
my question is : is there a way to complete that missing data to achieve something like this ?
Open High Low Close Adj Close Volume
Datetime
2022-02-16 15:00:00-05:00 172.872101 173.029999 172.839996 172.910004 172.910004 0
2022-02-16 15:01:00-05:00 172.899994 172.949997 172.779999 172.815002 172.815002 160249
2022-02-16 15:02:00-05:00 172.809998 172.990005 172.809998 172.979996 172.979996 119117
2022-02-16 15:03:00-05:00 172.970001 173.169998 172.964996 173.080093 173.080093 264624
2022-02-16 15:04:00-05:00 173.089996 173.320007 173.030106 173.315002 173.315002 311095
2022-02-16 15:05:00-05:00 173.320007 173.339996 173.164993 173.214996 173.214996 174639
2022-02-16 15:06:00-05:00 173.220001 173.220001 173.080002 173.139999 173.139999 124707
2022-02-16 15:07:00-05:00 173.139999 173.179993 173.089996 173.160004 173.160004 135559
CodePudding user response:
With resample to 1 minute periods then interpolate to fill the NaN values
df = df.resample('1T').interpolate(method='linear', limit_direction='forward', axis=0)
CodePudding user response:
There are lots of ways to do this. Go through the whole blog. https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779
- Drop the missing data if you've enough data for training.
- Add the data using the techniques in the blog.