Pandas: how select row based on string in previous row - should be a simple solution-CodePudding

I have a a csv file. How do I print the row that follows a row that has a particular string? I need to print all rows that contain "ixation" in them and then the row that follows this row.

Here is my current code

df = pd.read_csv('locationof.csv')
df = pd.DataFrame(data, columns = ['Trial', 'Code','Time','Duration'])
list1 = ['100_1to3_start','fixation','Fixation','66_1to3_start']
contain_values = df[df['Code'].str.contains('|'.join(list1), na=False)]

Here is my current output...

2      1.0                fixation_dummy    50637.0   25086.0
4      2.0                fixation_dummy    75889.0   25086.0
7      3.0                fixation_dummy   101141.0   25086.0
9      4.0                fixation_dummy   126393.0   25086.0
13     6.0  100_1to3_start_2034_1_0_1060   151811.0   20268.0
23     9.0  100_1to3_start_2456_4_0_2054   216104.0   24587.0
33    12.0  100_1to3_start_1507_7_0_2446   283885.0   15118.0
43    15.0                      Fixation   332229.0  130081.0
55    17.0   66_1to3_start_2369_2_0_2352   484904.0   23590.0
76    23.0   66_1to3_start_1539_8_0_2518   615150.0   15285.0
82    25.0                      Fixation   654357.0  130081.0
123   35.0                      Fixation   996089.0  130081.0
164   45.0                      Fixation  1343635.0  130081.0
174   46.0   66_1to3_start_1884_1_0_2537  1473882.0   18773.0
197   53.0   66_1to3_start_1541_8_0_2545  1621074.0   15284.0
204   55.0                      Fixation  1662939.0  130080.0
213   56.0  100_1to3_start_2115_1_0_2528  1793186.0   21098.0
223   59.0  100_1to3_start_1892_4_0_2544  1859638.0   18939.0
233   62.0  100_1to3_start_2315_7_0_2537  1918282.0   23259.0

But I want...

2      1.0                fixation_dummy    50637.0   25086.0
4      2.0                fixation_dummy    75889.0   25086.0
7      3.0                fixation_dummy   101141.0   25086.0
9      4.0                fixation_dummy   126393.0   25086.0
13     6.0  100_1to3_start_2034_1_0_1060   151811.0   20268.0
43    15.0                      Fixation   332229.0  130081.0
55    17.0   66_1to3_start_2369_2_0_2352   484904.0   23590.0
82    25.0                      Fixation   654357.0  130081.0
123   35.0                      Fixation   996089.0  130081.0
164   45.0                      Fixation  1343635.0  130081.0
174   46.0   66_1to3_start_1884_1_0_2537  1473882.0   18773.0
204   55.0                      Fixation  1662939.0  130080.0
213   56.0  100_1to3_start_2115_1_0_2528  1793186.0   21098.0

How do I only print out the lines 66_1to3.., 100_1to3...) that follow only a row with "ixation" in it? This code will be run over a series of csv files in which the exact lines that I need vary between csv files.

CodePudding user response：

To answer this description: "I need to print all rows that contain "ixation" in them and then the row that follows this row.", the solution would be:

# identify rows with "ixation"
mask = df['Code'].str.contains('ixation')

# select them and one row below
out = df[mask|mask.shift()]

CodePudding user response：

Try boolean indexing with shift since we only care about the rows that follow "ixation"

list1 = ['100_1to3_start', '66_1to3_start']
df[df[2].str.contains('|'.join(list1), na=False) & df[2].shift().str.contains('ixation')]

      0     1                             2          3        4
4    13   6.0  100_1to3_start_2034_1_0_1060   151811.0  20268.0
8    55  17.0   66_1to3_start_2369_2_0_2352   484904.0  23590.0
13  174  46.0   66_1to3_start_1884_1_0_2537  1473882.0  18773.0
16  213  56.0  100_1to3_start_2115_1_0_2528  1793186.0  21098.0

Note that df[2], based on your example, would be df['Code']