Home > Net >  Python - Iterate over dataframe and capture data in row above specific condition
Python - Iterate over dataframe and capture data in row above specific condition

Time:11-12

I am trying to iterate through a dataframe (both row and column) and extract the last row where the value is 2 which is above the value 6. Example of what I am after is below:

Input dataframe

import pandas as pd
data = [{'c1':'a', 'c2':2}, {'c1':'b','c2':2}, {'c1':'c','c2':6},{'c1':'d','c2':6},{'c1':'e','c2':2},{'c1':'c','c2':6}]
df = pd.DataFrame(data)
    c1  c2
0   a   2
1   b   2
2   c   6
3   d   6
4   e   2
5   c   6

Desired output:

0   b   2
1   e   2

I can iterate rows/cols using itterrows() but the issue is that I want to grab the value before another row. For this I would need to use the shift function in pandas as per my understanding. This function only works for pd.series and not a full dataframe.

Is there any other way I can do this? Please can someone point me in the right direction?

CodePudding user response:

You can use shift to add a new column that tells each row if the next one has 6, then filter nicely

import pandas as pd

data = [{'c1': 'a', 'c2': 2}, {'c1': 'b', 'c2': 2}, {'c1': 'c', 'c2': 6},
        {'c1': 'd', 'c2': 6}, {'c1': 'e', 'c2': 2}, {'c1': 'c', 'c2': 6}]
df = pd.DataFrame(data)
df['is_six'] = df['c2'].shift(-1) == 6
print(df)

sub_df = df[(df['c2'] != 6) & df['is_six']].reset_index(drop=True)
del sub_df['is_six']
print(sub_df)
  c1  c2  is_six
0  a   2   False
1  b   2    True
2  c   6    True
3  d   6   False
4  e   2    True
5  c   6   False

  c1  c2
0  b   2
1  e   2
  • Related