I have one multiindex dataframe which contains the x and y coordinates of different body segments across time. It looks like this:
segment 0 1 ... 98 99
coords k x y k ... y k x y
0 0.008525 312.05 361.65 0.011500 ... 329.97 0.012414 621.83 327.77
1 0.004090 312.32 359.98 0.007290 ... 329.00 0.034572 623.31 327.13
2 0.006645 313.42 359.11 0.011194 ... 330.53 0.003275 621.18 327.55
3 0.008367 314.79 361.47 0.013591 ... 329.58 0.026624 624.32 327.76
4 0.005160 315.91 364.54 0.009056 ... 329.97 0.026840 624.54 327.97
... ... ... ... ... ... ... ... ... ...
40006 -0.081192 323.60 354.73 -0.070411 ... 431.78 0.088513 432.43 433.49
40007 -0.050125 319.29 357.99 -0.074568 ... 431.00 0.470994 436.47 432.65
The shape is 40008 rows and 300 columns. The k value I do not need.
For some plotting, however, I need my data to look like this:
[[index0, x_i0_s0, y_i0_s0],
[index0, x_i0_s1,y_i0_s1],
[index0, x_i0_s2,y_i0_s2],
...
[[index40007, x_i40007_s97, y_400i70_s97],
[index40007, x_i40007_s98,y_i40007_s98],
[index40007, x_i40007_s99,y_i40007_s99]]]
Or with real data:
[[0, 312.05, 361.65],
...
[4007, 436.47, 432.65]]
So basically I can get rid of the segment ID, but keep the index. The ouput array should have the following dimensions: (len(index)*segments, 3). In in this case being (4000800, 3).
Since I am not very good at manipulating multi-index dataframes I have tried to get the x and y coordinates separately by:
x = df.xs(('x',), level=('coords',), axis=1)
y = df.xs(('y',), level=('coords',), axis=1)
And after that I have tried different things like np.column_stack() and np.reshape() but without success. The furthest I have gone is with:
x = df.xs(('x',), level=('coords',), axis=1)
y = df.xs(('y',), level=('coords',), axis=1)
result = np.stack((x,y)), axis=2)
Which gives me an array of shape (40008, 100, 2), instead of (400800, 3)
Any help would be greatly appreciated, thank you!
CodePudding user response:
Try this:
# A smaller input dataframe to see if I understand your problem correctly
index = pd.MultiIndex.from_product(
[range(5), list("kxy")], names=["segment", "corrds"]
)
df = pd.DataFrame(np.arange(10 * len(index)).reshape(-1, len(index)), columns=index)
# The manipulation
result = (
df.rename_axis("index")
.stack("segment")
.reset_index()[["index", "x", "y"]]
.to_numpy()
)