Home > Net >  How to sample every n step a dataframe over an axis
How to sample every n step a dataframe over an axis

Time:03-29

I have this data frame:


        element letter  number  type  wavelength   q  k  t  z  A
0     probe_1      A       1            1         600  0.002809  7.275943    0.000447    0.038921  1
1     probe_1      A       1            1         610  0.003098  7.278088    0.000356    0.030954  1
2     probe_1      A       1            1         620  0.002338  7.204654    0.000346    0.029990  1
3     probe_1      A       1            1         630  0.002307  7.279050    0.000408    0.034784  1
4     probe_1      A       1            1         640  0.002453  7.211055    0.000329    0.028913  1
...       ...    ...     ...          ...         ...       ...       ...         ...         ... ..

and I want to get a new data frame that contains only every 4 wavelengths, therefore: [600, 640, 680, 720, 760,...].

Therefore in this case:


        element letter  number  type  wavelength   q  k  t  z  A
0     probe_1      A       1            1         600  0.002809  7.275943    0.000447    0.038921  1
4     probe_1      A       1            1         640  0.002453  7.211055    0.000329    0.028913  1
...       ...    ...     ...          ...         ...       ...       ...         ...         ... ..

How can I extract only those data?

I have looked at pandas.DataFrame.sample but it appears that it samples randomly and I want to select with a predetermined step.

My attempt at the solution was this but I don't understand its issue:

            pivot = pd.pivot_table(data,index=[self.select])
            index = pivot.index
            subsampling = np.arange(0,len(pivot), self.subsample)
            to_take = [False]*len(data[self.select])
            for sub_smpl in range(len(subsampling)):
                tmp = [data[self.select][i] == index[subsampling[sub_smpl]] for i in range(len(to_take)) ]
                to_take = [to_take[i] == tmp[i] for i in range(len(to_take))] 
            data = data[to_take]

where self.select refers to the axis I want to subsample ("wavelengths") and self.subsample is the step size (in the previous example was 4)

CodePudding user response:

IIUC:

df[::4]

It comes from the notation start:stop:step.

CodePudding user response:

Group by the wavelength and then effectively "slice" it, eg:

output = df.groupby('wavelength').nth(slice(0, None, 4))

This'll give you an output of the first and then every 4th row for within each wavelength.

  • Related