Home > other >  Shuffle rows in dataframe by specific colum value
Shuffle rows in dataframe by specific colum value

Time:03-17

I have a dataframe as follows:

  Video Frames      Feature1    Feature2  Label
0   0   frame0.jpg  feature1    feature2    0
1   0   frame1.jpg  feature1    feature2    0
2   0   frame2.jpg  feature1    feature2    0
3   0   frame3.jpg  feature1    feature2    0
4   1   frame0.jpg  feature1    feature2    1
5   1   frame1.jpg  feature1    feature2    1
6   1   frame2.jpg  feature1    feature2    1
7   1   frame3.jpg  feature1    feature2    1
8   2   frame0.jpg  feature1    feature2    0
9   2   frame1.jpg  feature1    feature2    0
10  2   frame2.jpg  feature1    feature2    0
11  2   frame3.jpg  feature1    feature2    0

I want to create slots of frames (e.g. 2 consecutive frames) and make a shuffle of rows but always ensuring that two consecutive rows of frames belonging to the same Video are kept. For the case above and slots of 2 consecutive frames, the result would be:

  Video Frames      Feature1    Feature2    Label
0   0   frame0.jpg  feature1    feature2    0
1   0   frame1.jpg  feature1    feature2    0
2   1   frame2.jpg  feature1    feature2    1
3   1   frame3.jpg  feature1    feature2    1
4   2   frame2.jpg  feature1    feature2    0
5   2   frame3.jpg  feature1    feature2    0
6   2   frame0.jpg  feature1    feature2    0
7   2   frame1.jpg  feature1    feature2    0
8   0   frame2.jpg  feature1    feature2    0
9   0   frame3.jpg  feature1    feature2    0
10  1   frame0.jpg  feature1    feature2    1
11  1   frame1.jpg  feature1    feature2    1

I want the number of frames in the slot to be configurable, i mean, for the case above i selected 2 but maybe could be 3 or 10 consecutive frames. The window of frames selected is not sliding, i mean, i select [frame0, frame1] and in the next step [frame2, frame3] but never, [frame1,frame2].

I've being thinking the best way to do it, but not clear how.

Thank you in advance

CodePudding user response:

IIUC, you can select even indices shuffle, and add the odd indices using numpy:

import numpy as np

order = np.arange(0,len(df), 2)
np.random.shuffle(order)
order = np.vstack([order, order 1]).ravel('F')

df2 = df.iloc[order]

example output:

    Video      Frames  Feature1  Feature2  Label
2       0  frame2.jpg  feature1  feature2      0
3       0  frame3.jpg  feature1  feature2      0
0       0  frame0.jpg  feature1  feature2      0
1       0  frame1.jpg  feature1  feature2      0
6       1  frame2.jpg  feature1  feature2      1
7       1  frame3.jpg  feature1  feature2      1
8       2  frame0.jpg  feature1  feature2      0
9       2  frame1.jpg  feature1  feature2      0
10      2  frame2.jpg  feature1  feature2      0
11      2  frame3.jpg  feature1  feature2      0
4       1  frame0.jpg  feature1  feature2      1
5       1  frame1.jpg  feature1  feature2      1
  • Related