How to add the rows I want by index-CodePudding

I have df1 like this:

  Sample #    Aux
0    1      4 LA LA
1   7500    4 LA
2   15000   4 LA
3   22500   4 L L L
4   30000   4 L
... ... ...
235 1762500 W
236 1770000 W
237 1777500 W
238 1785000 2
239 1792500 2
240 rows × 2 columns

I extract the 'H' labels to have the starting points of the events with the following code:

SAS1 = df1['Aux'].str.contains('H')
print(df1[SAS1])

And I got this:

     Sample #    Aux
12      90000     3 HA
13      97500   3 H LA
36     270000  4 LA HA
46     345000      2 H
49     367500      2 H
163   1222500     2 HA
167   1252500   2 H LA
174   1305000      2 H
179   1342500     2 HA
180   1350000  2 LA HA
181   1357500   2 H LA
185   1387500   2 LA H
187   1402500      2 H
188   1410000      3 H
189   1417500     3 HA
191   1432500   3 L HA
192   1440000   2 H LA
198   1485000   2 L HA
203   1522500  2 LA HA
204   1530000   2 H LA
207   1552500     2 HA
208   1560000    2 H H
211   1582500      2 H
213   1597500     R HA
214   1605000     2 HA
216   1620000      2 H
217   1627500      2 H
218   1635000      2 H
219   1642500      R H
221   1657500     W HA
225   1687500      R H
227   1702500     R HA
230   1725000      R H
231   1732500      R H
232   1740000      R H
233   1747500      R H
234   1755000     R HA

Now I want to extract the starting and ending points simultaneously. For example, the starting point of an event is 36 and the ending point is 37 (i.e. the next one in df1). I want to do it for all of the events to do segmentation for signal data. How can I do it?

The output I want is to add the points with star:

     Sample #      Aux
 12      90000     3 HA
 13      97500   3 H LA
*14     105000
 36     270000  4 LA HA
*37     277500
 46     345000      2 H
*47     352500
 49     367500      2 H
*50     375000

I want a loop that check if the data points exist in SAS, add the next row from df1 into SAS.

CodePudding user response：

You could create a new column using the following:

event_nos = []
event_counter = 0

for a in df1['Aux']:
  if "H" in a:
    event_counter  = 1
  
  event_nos.append(event_counter)

df1['Event_Number'] = event_nos

And then use that to segment your data?

CodePudding user response：

This could be a possible approach:

import pandas as pd

# This just import the table already filtered by "H"
df = pd.read_csv("gasses.txt", delimiter=r"\s{2,}", engine="python")
df = df.set_index(["Index"])

# This is the logic
data = pd.DataFrame([(x   1, z   7500) for x, y, z in zip(df.index[:-1], 
df.index[1:], df["Sample #"]) if y != x   1], columns=("Index", "Sample #"))
data = data.set_index("Index")

df = pd.concat([df, data]).sort_index().fillna("")
print(df)

OUTPUT

       Sample #      Aux
Index
12        90000     3 HA
13        97500   3 H LA
14       105000
36       270000  4 LA HA
37       277500
46       345000      2 H
...
...

Basically, if indeces are not consecutive numbers, it creates a new DataFrame with the previous index 1 and the previous Sample # 7500, which is your step.

CodePudding user response：

You can use the iloc() which is an integer-based indexing function to add a row at a specific position of the data frame. You can also use it to assign new rows at that position.

df1.iloc[14] = ['105000']
df1