Convert all rows into a Series object pandas-CodePudding

I have a dataframe like so:

time       0           1           2           3           4           5    
0   3.477110    3.475698    3.475874    3.478345    3.476757    3.478169    
1   3.422223    3.419752    3.417987    3.421341    3.418693    3.418340    
2   3.474110    3.474816    3.477463    3.479757    3.479581    3.476757    
3   3.504995    3.507112    3.504995    3.505877    3.507112    3.508171    
4   3.426106    3.424870    3.422399    3.421517    3.419046    3.417105    
6   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924
7   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924
8   3.364336    3.362571    3.360453    3.358335    3.357806    3.356924

but sktime requires the data to be in a format where each dataframe entry is a seperate time series:

3.477110,3.475698,3.475874,3.478345,3.476757,3.478169   
3.422223,3.419752,3.417987,3.421341,3.418693,3.418340   
3.474110,3.474816,3.477463,3.479757,3.479581,3.476757   
3.504995,3.507112,3.504995,3.505877,3.507112,3.508171   
3.426106,3.424870,3.422399,3.421517,3.419046,3.417105   
3.364336,3.362571,3.360453,3.358335,3.357806,3.356924

Essentially as I have 6 cols of data, each row should become a seperate series (of length 6) and the final shape should be (9, 1) (for this example) instead of the (9, 6) it is right now

I have tried iterating over the rows, using various transform techniques but to no avail, I am looking for something similar to the .squeeze() method but that works for multiple datapoints, how does one go about it?

CodePudding user response：

I think you want something like this.

result = df.set_index('time').apply(np.array, axis=1)
print(result)
print(type(result))
print(result.shape)

time
0    [3.47711, 3.475698, 3.475874, 3.478345, 3.4767...
1    [3.422223, 3.419752, 3.417987, 3.421341, 3.418...
2    [3.47411, 3.474816, 3.477463, 3.479757, 3.4795...
3    [3.504995, 3.507112, 3.504995, 3.505877, 3.507...
4    [3.426106, 3.42487, 3.422399, 3.421517, 3.4190...
6    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
7    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
8    [3.364336, 3.362571, 3.360453, 3.358335, 3.357...
dtype: object
<class 'pandas.core.series.Series'>
(8,)

This is one pd.Series of length 8 (in your example data index 5 is missing;) ) and each value of the Series is a np.array. You can also go with list (in the applystatement) if you want.

CodePudding user response：

Convert all columns to str, because the join method only accepts string.

Then join all columns by a "," delimiter

df.astype(str).agg(','.join,axis=1)

df.astype(str).agg(','.join,axis=1).shape
(9,)