I have this data frame:
element letter number type wavelength q k t z A
0 probe_1 A 1 1 600 0.002809 7.275943 0.000447 0.038921 1
1 probe_1 A 1 1 610 0.003098 7.278088 0.000356 0.030954 1
2 probe_1 A 1 1 620 0.002338 7.204654 0.000346 0.029990 1
3 probe_1 A 1 1 630 0.002307 7.279050 0.000408 0.034784 1
4 probe_1 A 1 1 640 0.002453 7.211055 0.000329 0.028913 1
... ... ... ... ... ... ... ... ... ... ..
and I want to get a new data frame that contains only every 4 wavelengths, therefore: [600, 640, 680, 720, 760,...].
Therefore in this case:
element letter number type wavelength q k t z A
0 probe_1 A 1 1 600 0.002809 7.275943 0.000447 0.038921 1
4 probe_1 A 1 1 640 0.002453 7.211055 0.000329 0.028913 1
... ... ... ... ... ... ... ... ... ... ..
How can I extract only those data?
I have looked at pandas.DataFrame.sample
but it appears that it samples randomly and I want to select with a predetermined step.
My attempt at the solution was this but I don't understand its issue:
pivot = pd.pivot_table(data,index=[self.select])
index = pivot.index
subsampling = np.arange(0,len(pivot), self.subsample)
to_take = [False]*len(data[self.select])
for sub_smpl in range(len(subsampling)):
tmp = [data[self.select][i] == index[subsampling[sub_smpl]] for i in range(len(to_take)) ]
to_take = [to_take[i] == tmp[i] for i in range(len(to_take))]
data = data[to_take]
where self.select refers to the axis I want to subsample ("wavelengths") and self.subsample is the step size (in the previous example was 4)
CodePudding user response:
IIUC:
df[::4]
It comes from the notation start:stop:step
.
CodePudding user response:
Group by the wavelength and then effectively "slice" it, eg:
output = df.groupby('wavelength').nth(slice(0, None, 4))
This'll give you an output of the first and then every 4th row for within each wavelength.