I am trying to iterate through a dataframe (both row and column) and extract the last row where the value is 2
which is above the value 6
. Example of what I am after is below:
Input dataframe
import pandas as pd
data = [{'c1':'a', 'c2':2}, {'c1':'b','c2':2}, {'c1':'c','c2':6},{'c1':'d','c2':6},{'c1':'e','c2':2},{'c1':'c','c2':6}]
df = pd.DataFrame(data)
c1 c2
0 a 2
1 b 2
2 c 6
3 d 6
4 e 2
5 c 6
Desired output:
0 b 2
1 e 2
I can iterate rows/cols using itterrows()
but the issue is that I want to grab the value before another row. For this I would need to use the shift
function in pandas as per my understanding. This function only works for pd.series
and not a full dataframe.
Is there any other way I can do this? Please can someone point me in the right direction?
CodePudding user response:
You can use shift
to add a new column that tells each row if the next one has 6
, then filter nicely
import pandas as pd
data = [{'c1': 'a', 'c2': 2}, {'c1': 'b', 'c2': 2}, {'c1': 'c', 'c2': 6},
{'c1': 'd', 'c2': 6}, {'c1': 'e', 'c2': 2}, {'c1': 'c', 'c2': 6}]
df = pd.DataFrame(data)
df['is_six'] = df['c2'].shift(-1) == 6
print(df)
sub_df = df[(df['c2'] != 6) & df['is_six']].reset_index(drop=True)
del sub_df['is_six']
print(sub_df)
c1 c2 is_six
0 a 2 False
1 b 2 True
2 c 6 True
3 d 6 False
4 e 2 True
5 c 6 False
c1 c2
0 b 2
1 e 2