I have a dataframe like so:
time 0 1 2 3 4 5
0 3.477110 3.475698 3.475874 3.478345 3.476757 3.478169
1 3.422223 3.419752 3.417987 3.421341 3.418693 3.418340
2 3.474110 3.474816 3.477463 3.479757 3.479581 3.476757
3 3.504995 3.507112 3.504995 3.505877 3.507112 3.508171
4 3.426106 3.424870 3.422399 3.421517 3.419046 3.417105
6 3.364336 3.362571 3.360453 3.358335 3.357806 3.356924
7 3.364336 3.362571 3.360453 3.358335 3.357806 3.356924
8 3.364336 3.362571 3.360453 3.358335 3.357806 3.356924
but sktime requires the data to be in a format where each dataframe entry is a seperate time series:
3.477110,3.475698,3.475874,3.478345,3.476757,3.478169
3.422223,3.419752,3.417987,3.421341,3.418693,3.418340
3.474110,3.474816,3.477463,3.479757,3.479581,3.476757
3.504995,3.507112,3.504995,3.505877,3.507112,3.508171
3.426106,3.424870,3.422399,3.421517,3.419046,3.417105
3.364336,3.362571,3.360453,3.358335,3.357806,3.356924
Essentially as I have 6 cols of data, each row should become a seperate series (of length 6) and the final shape should be (9, 1) (for this example) instead of the (9, 6) it is right now
I have tried iterating over the rows, using various transform techniques but to no avail, I am looking for something similar to the .squeeze()
method but that works for multiple datapoints, how does one go about it?
CodePudding user response:
I think you want something like this.
result = df.set_index('time').apply(np.array, axis=1)
print(result)
print(type(result))
print(result.shape)
time
0 [3.47711, 3.475698, 3.475874, 3.478345, 3.4767...
1 [3.422223, 3.419752, 3.417987, 3.421341, 3.418...
2 [3.47411, 3.474816, 3.477463, 3.479757, 3.4795...
3 [3.504995, 3.507112, 3.504995, 3.505877, 3.507...
4 [3.426106, 3.42487, 3.422399, 3.421517, 3.4190...
6 [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
7 [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
8 [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
dtype: object
<class 'pandas.core.series.Series'>
(8,)
This is one pd.Series
of length 8 (in your example data index 5 is missing;) ) and each value of the Series is a np.array
. You can also go with list
(in the apply
statement) if you want.
CodePudding user response:
Convert all columns to str, because the join method only accepts string.
Then join all columns by a "," delimiter
df.astype(str).agg(','.join,axis=1)
df.astype(str).agg(','.join,axis=1).shape
(9,)