Home > OS >  K-Means on temporal dataset
K-Means on temporal dataset

Time:06-10

I have a temporal dataset(1000000x70) consisting of info about the activities of 20 subjects. I need to apply subsampling to the dataset as it has more than a million rows. How to select a set of observations of each subject ideally from it? Later, I need to apply PCA and K-means on it. Kindly help me with the steps to be followed. I'm working in MATLAB.

CodePudding user response:

I'm not really clear on what you're looking for. If you just want to subsample a matrix on matlab, here is a way to do it:

myData;                      % 70 x 1000000  data
nbDataPts = size(myData, 2); % Get the number of points in the data

subsampleRatio = 0.1;        % Ratio of data you want to keep
nbSamples = round(subsampleRatio * nbDataPts);  % How many points to keep
sampleIdx = round(linspace(1, nbDataPts, nbSamples)); % Evenly space indices of the points to keep

sampledData = myData(:, sampleIdx);  % Sampling data

Then if you want to apply PCA and K means I suggest you take a look at the relevant documentation:

Try to work with it, and open a new question if a specific problem arises.

  • Related