I have a csv file with many patients' health measurement data. Each patient has a different number of measurements. (Some patients come frequently, some don't.) I am trying to do a next value prediction model to predict the patients' risk of specific incidences. Since the values are all in time sequence, I've tried to use LSTM to make predictions. Also, I am concatenating all the patients' health data together into a long column. (Please see attachment)
what I am feeding into the LSTM
And my LSTM model generates results like stock price prediction.
But I wonder if there are better ways. I think my current method of concatenating all my patients' data is strange. Since all the patients have a different number of measurements, I am not sure if can feed them to the LSTM model in parallel. Or maybe I should use random forest because each patient's data has unique distribution? Thank you!
CodePudding user response:
Regarding the different lengths of your data, you can use Padding and Masking to make your data evenly lengthed (Description of Padding/Masking with Tensorflow). Predicting sequence based data using LSTMs is generally a good way, but I would advise you to look in GRUs instead of LSTMs and also into Transformer architectures, becuase by now they have many advantages to LSTMs.