Home > Back-end >  Building Financial Data Time Series and fundamental data Database (Multidimensional)
Building Financial Data Time Series and fundamental data Database (Multidimensional)

Time:07-19

I'm working on building an ANN model to predict stock movements. My input data is weekly stock prices (open, close, high and low), trading volume, and other fundamental ratios (11 features in total) for the last 20 years of 211 stocks (uncleaned).

I'm new to machine learning and I wanted to ask how can I organize my data in a single dataframe to clean it?

The goal is to clean the data, reduce dimentionality (feature selection) and then work on the model.

CodePudding user response:

Kaggle has lots of great resources related to cleaning datasets. A good approach would be to aggregate all relevant data in a way that makes sense, then to begin the cleaning process of analyzing missing values, scaling/normalization and encoding. Keep in mind which models you are interested in using to create your time series later and understand what types of data they work best with. This may require turning some continuous data into a more discrete form.

Specifically, relating to analysis and cleaning of security related data, I highly suggest you check out QuantConnect as they have lots of tutorials specific to that topic.

EDA will help you pick out important features and determine the best way to engineer them, reducing your dementionality. It would be difficult to determine the best features for your model without first finding the significance of each feature. If you are new to EDA, maybe check out pandas profiling as it gives some useful insight.

Hopefully that helps (:

  • Related