I have df1 like this:
Sample # Aux
0 1 4 LA LA
1 7500 4 LA
2 15000 4 LA
3 22500 4 L L L
4 30000 4 L
... ... ...
235 1762500 W
236 1770000 W
237 1777500 W
238 1785000 2
239 1792500 2
240 rows × 2 columns
I extract the 'H' labels to have the starting points of the events with the following code:
SAS1 = df1['Aux'].str.contains('H')
print(df1[SAS1])
And I got this:
Sample # Aux
12 90000 3 HA
13 97500 3 H LA
36 270000 4 LA HA
46 345000 2 H
49 367500 2 H
163 1222500 2 HA
167 1252500 2 H LA
174 1305000 2 H
179 1342500 2 HA
180 1350000 2 LA HA
181 1357500 2 H LA
185 1387500 2 LA H
187 1402500 2 H
188 1410000 3 H
189 1417500 3 HA
191 1432500 3 L HA
192 1440000 2 H LA
198 1485000 2 L HA
203 1522500 2 LA HA
204 1530000 2 H LA
207 1552500 2 HA
208 1560000 2 H H
211 1582500 2 H
213 1597500 R HA
214 1605000 2 HA
216 1620000 2 H
217 1627500 2 H
218 1635000 2 H
219 1642500 R H
221 1657500 W HA
225 1687500 R H
227 1702500 R HA
230 1725000 R H
231 1732500 R H
232 1740000 R H
233 1747500 R H
234 1755000 R HA
Now I want to extract the starting and ending points simultaneously. For example, the starting point of an event is 36 and the ending point is 37 (i.e. the next one in df1). I want to do it for all of the events to do segmentation for signal data. How can I do it?
The output I want is to add the points with star:
Sample # Aux
12 90000 3 HA
13 97500 3 H LA
*14 105000
36 270000 4 LA HA
*37 277500
46 345000 2 H
*47 352500
49 367500 2 H
*50 375000
I want a loop that check if the data points exist in SAS, add the next row from df1 into SAS.
CodePudding user response:
You could create a new column using the following:
event_nos = []
event_counter = 0
for a in df1['Aux']:
if "H" in a:
event_counter = 1
event_nos.append(event_counter)
df1['Event_Number'] = event_nos
And then use that to segment your data?
CodePudding user response:
This could be a possible approach:
import pandas as pd
# This just import the table already filtered by "H"
df = pd.read_csv("gasses.txt", delimiter=r"\s{2,}", engine="python")
df = df.set_index(["Index"])
# This is the logic
data = pd.DataFrame([(x 1, z 7500) for x, y, z in zip(df.index[:-1],
df.index[1:], df["Sample #"]) if y != x 1], columns=("Index", "Sample #"))
data = data.set_index("Index")
df = pd.concat([df, data]).sort_index().fillna("")
print(df)
OUTPUT
Sample # Aux
Index
12 90000 3 HA
13 97500 3 H LA
14 105000
36 270000 4 LA HA
37 277500
46 345000 2 H
...
...
Basically, if indeces are not consecutive numbers, it creates a new DataFrame with the previous index 1 and the previous Sample #
7500, which is your step.
CodePudding user response:
You can use the iloc() which is an integer-based indexing function to add a row at a specific position of the data frame. You can also use it to assign new rows at that position.
df1.iloc[14] = ['105000']
df1